Performing transformation on uploaded excel file and return dash table after 'Submit' button trigger

Happy to finally be contributing to the Plotly Dash community! I am still learning dash and have stumbled upon a question during a Dash build (Not sure if anyone has tried a version of this workflow yet but i’ve searched through the community posts and documentation and can’t seem to figure out what I am doing wrong

The goal: Allow a user to upload an .xlsx file, parse to obtain certain sheets within the file, make some transformations to the data in the sheets and then compare the contents of the 2 sheet. Any discrepancies between the sheets will be output to a dash data table.

The user interactivity components in order or process:

  1. Upload excel file
  2. Use Dropdown to select type of quality check
  3. hit submit button to run quality check
  4. explore discrepancies using DashDataTable output

I’ve attached images of my app layout and callback functions below for reference, any help or guidance is much appreciated!


HI @adamcprez welcome to the forums.

Your question is quite generic. Where exactly do you need help, or better said, which part of your process is working as expected?

  • upload of file
  • dropdown
  • button click
  • prepare datatable

By the way, it would be better to provide your code as preformatted text instead of images.

Hi @AIMPED ,

Thanks for the reply!

To be more specific on a few of the callback pieces:

**On the upload File button/Datable interaction (perhaps my biggest confusion)

  1. I’ve reduced to code to the following to only display the contents of one sheet .I am not getting any errors on the file processing portion of the callback but the Datable is not appearing within the dash app.
@app.callback(Output('output-data', 'children'),
              Input('upload-data', 'contents'), 
              [State('upload-data', 'filename'),
              State('upload-data', 'last_modified')], prevent_inital_call=True)
def update_output(contents, filename, last_modified):
    if contents is not None:
        # Get Taxonomy Requirements
        taxonomy_df = [
            parse_contents(c, n, d, sheet='MASTER SOURCE TAB', rows_to_skip=1) for c, n, d in
            zip(contents, filename, last_modified)]
        
        # Get Placements
        trafficking_df = [
            parse_contents(c, n, d, sheet='Social Trafficking', rows_to_skip=6) for c, n, d in
            zip(contents, filename, last_modified)]

    return html.Div([
            dash_table.DataTable(
                data=taxonomy_df,
                columns=taxonomy_df.columns,
                page_size=15)
            ])

On the upload File button/ drop down interaction:

  1. when the user uploads a file will that file store in the State parameter similar to a dcc.Store() where the callback can access the raw file once the submit button is triggered and perform its transformations?

Note: I’ve taken your suggestion and have copied the code below into text rather than images. (Thanks again for your help and tips, very appreciated!)

Here is the app layout:

app.layout = dbc.Container([
    # Upload file 
    dbc.Row([dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Do not allow multiple files to be uploaded
        multiple=False
    )]),
    
    # Select check to perform
    dbc.Row([dbc.Select(
        id='channel-QA-chosen',
        placeholder = 'Select A Channel Taxonomy....',
        options= [
            {'label': 'Programmatic-Display', 'value': 'Programmatic-Direct'},
            {'label': 'Social', 'value': 'Social'},
            {'label': 'Print', 'value': 'Print', 'disabled': True}  
        ])]),
    
    # Button to trigger check
    dbc.Row([dbc.Button(
    id='submit-button', children = "Run QA", n_clicks=0)]),
    
    # Output discrpency tables
    dbc.Row([
             html.Div(id='output-data')])
], fluid=True)

Here is the original Callback code:

@app.callback(Output('output-data', 'children'),
              Input('submit-buttom', 'n_clicks'), # Add submit button and over arching if statement (if n is none: return no update, else do something)
              [State('channel-QA-chosen', 'options'),
              State('upload-data', 'contents'), 
              State('upload-data', 'filename'),
              State('upload-data', 'last_modified')], prevent_inital_call=True)
def update_output(n_clicks, options, contents, filename, last_modified):
    if n_clicks is None:
        None
    else:
        if contents is not None and options == 'Social':
            # Get Taxonomy Requirements
            taxonomy_df = [
                parse_contents(c, n, d, sheet='MASTER SOURCE TAB', rows_to_skip=1) for c, n, d in
                zip(contents, filename, last_modified)]
            
            # Get Placements
            trafficking_df = [
                parse_contents(c, n, d, sheet='Social Trafficking', rows_to_skip=6) for c, n, d in
                zip(contents, filename, last_modified)]
            trafficking_df = trafficking_df['Package / Placement Name'] # Extract Placement/Package name column
            search_discreps = trafficking_df.str.split('_', expand = True) # split placement name on '_' delimiter
            search_discreps.rename(columns=soc_dict, inplace=True) # rename columns
            discrepencies_table = run_QA(taxonomy_df=taxonomy_df, workbook_df=trafficking_df, search_discreps=search_discreps, options=options)
    
        elif contents is not None and options == 'Programmatic-Display':   
            # Get Taxonomy Requirements  
            taxonomy_df = [
                parse_contents(c, n, d, sheet='MASTER SOURCE TAB', rows_to_skip=1) for c, n, d in
                zip(contents, filename, last_modified)]
            
            # Get Placements
            trafficking_df = [
                parse_contents(c, n, d, sheet='ProgDirect Trafficking', rows_to_skip=6) for c, n, d in
                zip(contents, filename, last_modified)]   
            trafficking_df = trafficking_df['Package / Placement Name'] # Extract Placement/Package name column
            search_discreps = trafficking_df.str.split('_', expand = True) # split placement name on '_' delimiter
            search_discreps.rename(columns=disp_dict, inplace=True) # rename columns
            discrepencies_table = run_QA(taxonomy_df=taxonomy_df, workbook_df=trafficking_df, search_discreps=search_discreps, options=options)

        elif contents is not None and options == 'Print':    
            # Get Taxonomy Requirements
            taxonomy_df = [
                parse_contents(c, n, d, sheet='MASTER SOURCE TAB', rows_to_skip=1) for c, n, d in
                zip(contents, filename, last_modified)]
            
            # Get Placements
            trafficking_df = [
                parse_contents(c, n, d, sheet='Print Trafficking', rows_to_skip=6) for c, n, d in
                zip(contents, filename, last_modified)] 
            trafficking_df = trafficking_df['Package / Placement Name'] # Extract Placement/Package name column
            search_discreps = trafficking_df.str.split('_', expand = True) # split placement name on '_' delimiter
            search_discreps.rename(columns=dict,inplace=True) # rename columns
            discrepencies_table = run_QA(taxonomy_df=taxonomy_df, workbook_df=trafficking_df, search_discreps=search_discreps, options=options)
                            
    return html.Div([
            dash_table.DataTable(
                data=discrepencies_table,
                columns=discrepencies_table.columns,
                page_size=15)
            ])

Hi @adamcprez it’s quite hard to search for the problem as you are using functions such as parse_contents which are unknown for us.

Do you run your app in debug=True? Do you get any errors?

Hi @AIMPED ,

when I run debug=True I get a “SystemExit: 1” error so I haven’t been utilizing the debug feature (I know this is definitely not best practice in dev environments)

The parse function is a slightly modified version of the standard parse function in the dash documentation. I’ve attached both the dash function and my modified function below.

Dash Function

def parse_contents(contents, filename, date):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),
        html.H6(datetime.datetime.fromtimestamp(date)),

        dash_table.DataTable(
            df.to_dict('records'),
            [{'name': i, 'id': i} for i in df.columns]
        )

Modified Function

def parse_contents(contents, filename, date, sheet, rows_to_skip):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded), sheet_name=sheet, skiprows=rows_to_skip)
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return df

Are you using jupyterDash?

Try app.run(debug=True, use_reloader=False)

Thanks for the tip, that work and now I can see the error! The error I received is below:

Traceback (most recent call last):
  File "/var/folders/v5/rmb14mms7zd1wpr3g076nws80000gq/T/ipykernel_87376/2271932093.py", line 660, in update_output
    data=taxonomy_df,
UnboundLocalError: local variable 'taxonomy_df' referenced before assignment

I am kind of confused by the error since I define the variable taxonomy_df as return df from the parse_function().

You have three elif statements under your else. If none of these conditions is True neither taxonaomy_df nor trafficking_df ever get defined.

That would be the first thing I’d check.

1 Like

Hi Amped,

Thanks for that insight. To debug i’ve limited the update_output function to the following to check if I can just output the decoded dataframe being returned:

@app.callback(Output('output-data', 'children'),
              Input('upload-data', 'contents'), 
              [State('upload-data', 'filename'),
              State('upload-data', 'last_modified')], prevent_inital_call=True)
def update_output(contents, filename, last_modified):
    if contents is not None:
        # Get Taxonomy Requirements
        taxonomy_df = [
            parse_contents(c, n, d, sheet='MASTER SOURCE TAB', rows_to_skip=1) for c, n, d in
            zip(contents, filename, last_modified)]
    
    return html.Div([
            dash_table.DataTable(
                data=taxonomy_df,
                columns=taxonomy_df.columns,
                page_size=15)
            ])

I am now getting the following traceback:

Traceback (most recent call last):
  File "/var/folders/v5/rmb14mms7zd1wpr3g076nws80000gq/T/ipykernel_20974/2905858161.py", line 651, in update_output
    zip(contents, filename, last_modified)]
TypeError: 'float' object is not iterable

My solution here was to convert contents to all strings but I have a feeling that issue lies within the list comprehension in the update_output function. I know this may be a general decoding python question but I can’t seem to find an answer to how to solve this and any direction or help you can give here is very appreciated.

HI @adamcprez, I do not understand why you use zip. What do your three arguments contens, filename , last_modified look like?

Looking at your parser function I doubt that you can use zip here.

That’s what I would check next.

I think there is something wrong in the callback, you should get the value instead of the options for component ID channel-QA-chosen. I commented out the corresponding line

@app.callback(
    Output('output-data', 'children'),
    Input('submit-button', 'n_clicks'),
    # Add submit button and over arching if statement (if n is none: return no update, else do something)
    [
        # State('channel-QA-chosen', 'options'),
        State('channel-QA-chosen', 'value'),
        State('upload-data', 'contents'),
        State('upload-data', 'filename'),
        State('upload-data', 'last_modified')
    ],
    prevent_inital_call=True
)