dcc.Upload - parse contents with multiple worksheets - Part 2

Danny1 · July 15, 2022, 8:24am

Hello,

my question is similar to a question already posted before (see link below). However, in this post I could not find the answer as the asker already found it by himself and dit not provide the solution.

The problem is that I want to upload an excel file (.xlsx) with multiple worksheets. For files containing a single sheet the standard code works fine. But with multiple worksheets I get the error: 'list’object has no attribute ‘split’

How can I split or subdivide my contents string so I can decode it and write the desired sheet to a pandas dataframe? I understand how to do it if i pick the file from a folder on my computer as then it is rather simple, but in Dash the uploaded file is converted into a string which needs to be converted in order to store the data (if I understood correctly) and that is where I have some troubles.

Thanks in advance for any help.

The code:
def parse_data_pulse(contents, filename):
    content_type, content_string = contents.split(",")

    decoded = base64.b64decode(content_string)
    try:
        if "csv" in filename:
            # Assume that the user uploaded a CSV or TXT file
            PV = pd.read_csv(io.StringIO(decoded.decode("utf-8")))
        elif "xls" in filename:
            # Assume that the user uploaded an Excel file -------------------------> Nog Aanpassen voor CSV en Text en meerder kolommen
            PV = pd.read_excel(io.BytesIO(decoded))
        elif "txt" or "tsv" in filename:
            # Assume that the user upl, delimiter = r'\s+'loaded an excel file
            PV = pd.read_csv(io.StringIO(decoded.decode("utf-8")), delimiter=r"\s+")
    except Exception as e:
        print(e)
        return html.Div(["There was an error processing this file."])

    return PV

jlfsjunior · July 15, 2022, 9:11am

Hi,

Could you share your callback and how you are parsing the content with us?

Danny1 · July 15, 2022, 9:50am

Of course, I included it in the original post

jlfsjunior · July 15, 2022, 10:34am

Ok, thanks! Is this the line that is failing?

If so, then I would need to see the callback to check how you are invoking this function. It could be that you have multiple files being uploaded at once and contents is a list of str instead of a str…

From the point of view of parsing the xls file, your code should work with just the first sheet imported to pandas. You can import all sheets using the following line instead:

PV = pd.read_excel(io.BytesIO(decoded), sheet_name=None)

Just note that PV will be a dict with an entry per sheet.

Danny1 · July 15, 2022, 12:03pm

Yes, that’s indeed the line failing.

This is the current callback:

@app.callback(Output('TData', 'children'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = parse_data_pulse(contents, filename)
        print(Data)
        return Data

Danny1 · July 15, 2022, 12:18pm

I’ve found a solution, by using this:

And this for the callback:

@app.callback(Output('TData', 'data'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = [parse_data_puls(c, f) for c, f in
            zip(contents, filename)]
        print(type(Data))
        return Data

However, then the resulting Data is a list and not a DataFrame or dict

jlfsjunior · July 15, 2022, 12:40pm

Yes, but it makes sense: dcc.Upload allows you to upload multiple files at once, plus you can have multiple sheets (dataframes) per file, so you have many dataframes.

Instead of returning the data to a single DataTable, it is better to return a list of DataTables to a html.Div(). Something like:

# Assuming you have a html.Div(id="tables-container") in the layout

@app.callback(Output('tables-container', '"children'),
              Input('upload-data-Temp', 'contents'),
              State('upload-data-Temp', 'filename'))
def update_output(contents, filename):
    if contents:
        Data = [parse_data_puls(c, f) for c, f in
            zip(contents, filename)]
        print(type(Data))
        
        return [
           dash_table.DataTable(data=df.to_dict("records", columns=df.columns)
           for df in content for content in contents
        ]

Note that two loops are needed in the compreehsion: one to get a content (file) out of all uploaded files and the other to get the dataframe in one of the spreadsheets. You can of course customize the returned components, adding a heading based on the filename and sheet name.

Danny1 · July 15, 2022, 12:47pm

Hey @jlfsjunior indeed that might be better and it works. Thanks for the help!

Topic		Replies	Views
dcc.Upload - parse contents with multiple worksheets Dash Python	3	1937	November 22, 2020
Dash datatable import multiple files Dash Python	4	597	October 15, 2021
How to reproduce 'content' Output of dcc.Upload? Dash Python	2	1721	December 23, 2021
Upload cvs or excel file to dash Dash Python	7	5358	September 22, 2023
dcc.Upload in Render-deployed app Dash Python question	1	190	January 25, 2023

dcc.Upload - parse contents with multiple worksheets - Part 2

Related topics