dcc.Upload - parse contents with multiple worksheets - Part 2

Hello,

my question is similar to a question already posted before (see link below). However, in this post I could not find the answer as the asker already found it by himself and dit not provide the solution.

The problem is that I want to upload an excel file (.xlsx) with multiple worksheets. For files containing a single sheet the standard code works fine. But with multiple worksheets I get the error: 'list’object has no attribute ‘split’

How can I split or subdivide my contents string so I can decode it and write the desired sheet to a pandas dataframe? I understand how to do it if i pick the file from a folder on my computer as then it is rather simple, but in Dash the uploaded file is converted into a string which needs to be converted in order to store the data (if I understood correctly) and that is where I have some troubles.

Thanks in advance for any help.

The code:
def parse_data_pulse(contents, filename):
    content_type, content_string = contents.split(",")

    decoded = base64.b64decode(content_string)
    try:
        if "csv" in filename:
            # Assume that the user uploaded a CSV or TXT file
            PV = pd.read_csv(io.StringIO(decoded.decode("utf-8")))
        elif "xls" in filename:
            # Assume that the user uploaded an Excel file -------------------------> Nog Aanpassen voor CSV en Text en meerder kolommen
            PV = pd.read_excel(io.BytesIO(decoded))
        elif "txt" or "tsv" in filename:
            # Assume that the user upl, delimiter = r'\s+'loaded an excel file
            PV = pd.read_csv(io.StringIO(decoded.decode("utf-8")), delimiter=r"\s+")
    except Exception as e:
        print(e)
        return html.Div(["There was an error processing this file."])

    return PV

Hi,

Could you share your callback and how you are parsing the content with us?

Of course, I included it in the original post

Ok, thanks! Is this the line that is failing?

If so, then I would need to see the callback to check how you are invoking this function. It could be that you have multiple files being uploaded at once and contents is a list of str instead of a str…

From the point of view of parsing the xls file, your code should work with just the first sheet imported to pandas. You can import all sheets using the following line instead:

PV = pd.read_excel(io.BytesIO(decoded), sheet_name=None)

Just note that PV will be a dict with an entry per sheet.

Yes, that’s indeed the line failing.

This is the current callback:

@app.callback(Output('TData', 'children'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = parse_data_pulse(contents, filename)
        print(Data)
        return Data

I’ve found a solution, by using this:

And this for the callback:

@app.callback(Output('TData', 'data'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = [parse_data_puls(c, f) for c, f in
            zip(contents, filename)]
        print(type(Data))
        return Data

However, then the resulting Data is a list and not a DataFrame or dict

Yes, but it makes sense: dcc.Upload allows you to upload multiple files at once, plus you can have multiple sheets (dataframes) per file, so you have many dataframes.

Instead of returning the data to a single DataTable, it is better to return a list of DataTables to a html.Div(). Something like:

# Assuming you have a html.Div(id="tables-container") in the layout

@app.callback(Output('tables-container', '"children'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = [parse_data_puls(c, f) for c, f in
            zip(contents, filename)]
        print(type(Data))
        
        return [
           dash_table.DataTable(data=df.to_dict("records", columns=df.columns)
           for df in content for content in contents
        ]

Note that two loops are needed in the compreehsion: one to get a content (file) out of all uploaded files and the other to get the dataframe in one of the spreadsheets. You can of course customize the returned components, adding a heading based on the filename and sheet name.

Hey @jlfsjunior indeed that might be better and it works. Thanks for the help!

1 Like