Learn how to use Dash Bio for next-gen sequencing & quality control. 🧬Register for the Oct 27 webinar.

How to convert file uploaded in dcc.Upload to integer to use in calculations?

Hello. I’m new here and having trouble with the dcc.Upload component.

I would like to make it so that I can use the data in the uploaded file in a callback that performs calculation for another datatable where the user inputs data. The calculations work perfect for the input data but do not work on the uploaded data to datatable. I suspect the values are being read as strings? Though, when I try to convert, I get errors about Nans not being able to convert to integers. How would I go about converting specific columns to integers?

Note: I needed to make the column names standard no matter what the user uploads in order for calculations to work. I am assuming col order in uploads will be same. This is why I am taking row values from upload and putting it into df.

Here is my code:

def parse_contents(contents, filename):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df1 = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df1 = pd.read_excel(io.BytesIO(decoded))

    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    newdf= pd.DataFrame(df1)
    
    num_cols = len(newdf.columns.values)
    num_rows = len(newdf.index)
    for i in range( num_cols ):
        df.iloc[:num_rows,i] = newdf.iloc[:,i]
    idx = np.where( df.iloc[:,0] == '\"\"' )[0]
    df.drop(idx, inplace=True)
    df['Entity']=df['Entity'].str.upper()
    df.loc[df['Asset Type'].str.contains('alts'), 'Asset Type'] = 'Equity/Alts'
    df.loc[df['Asset Type'].str.contains('equity'), 'Asset Type'] = 'Equity/Alts'
    df.loc[df['Asset Type'].str.contains('credit'), 'Asset Type'] = 'Credit ex Structured'
    df.loc[df['Asset Type'].str.contains('structured'), 'Asset Type'] = 'Structured'
    df['Asset Class']=df['Asset Class'].str.title()
    df.loc[df['Asset Type'] == 'Equity/Alts', 'Tenor']='None'
    df.loc[df['Asset Type'] == 'Equity/Alts', 'Rating']='None'
    # df['Net Investment Income (% pts)']= df['Net Investment Income (% pts)'].astype('int64')

    # df['EL Override (% pts)']= df['EL Override (% pts)'].astype(int)
    # df['1in10 UL Override (% pts)']= df['1in10 UL Override (% pts)'].astype(int)
    # df['Base RBC Override (% pts)']= df['Base RBC Override (% pts)'].astype(int)
    # df['1in10 RBC Override (% pts)']= df['1in10 RBC Override (% pts)'].astype(int)

    return dash_table.DataTable( id='uploadtable',
        data= df.to_dict('records'),
        columns=[{'name': i, 'id': i} for i in df.columns], 
        page_action = 'none',
        style_table = {'height':'250px','overflowY':'auto'},
        editable=True,               
        style_header_conditional= necessary + optional,
        style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
        fixed_rows={'headers': True, 'data': 0},
        css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]         
    )

#import 
@CAD.callback(Output('output-to-data-upload', 'children'),
            Input('upload-to-data', 'contents'),
            State('upload-to-data', 'filename'),
            )

def update_to_output(to_list_of_contents, to_list_of_names):
    if to_list_of_contents is not None:
        children = [
            parse_contents(c, n) for c, n in
            zip(to_list_of_contents, to_list_of_names)]
        return children 

@CAD.callback(Output('output-from-data-upload', 'children'),
            Input('upload-from-data', 'contents'),
            State('upload-from-data', 'filename'),
            )
 
def update_from_output(from_list_of_contents, from_list_of_names):
    if from_list_of_contents is not None:
        children = [
            parse_contents(c, n) for c, n in
            zip(from_list_of_contents, from_list_of_names)]
        return children

So I’ve realized I don’t need to convert to integers and that my uploaded data can flow through my calcs. The issue I am having now is that I need parse_contents to be able to differentiate between dcc.Upload 1 and dcc.Upload 2 as what is uploaded in these needs to be used differently in my calcs. Currently, the only way to get the to flow through is to call on ‘uploadtable’ as def update_to_output and update_from_output zip contents and make it so I cannot access the data from the datatable (newdf). I guess I’m wondering if I need those functions? or if there is a way to make parse_contents return two different tables depending on where the upload is coming form?

Tried to do something like this but am getting this error:
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: upload)
‘list’ object has no attribute ‘split’

Would someone be able to help me correct this?

@CAD.callback(
    Output('output-to-data-upload', 'children'),
    Output('output-from-data-upload', 'children'),
    Input('upload-to-data', 'contents'),
    Input('upload-from-data', 'contents'),
    State('upload-to-data', 'filename'),
    State('upload-from-data', 'filename'),
)
def upload(toc, fromc, tof, fromf):
    if toc is not None:
                toc_type, toc_string = toc.split(',')

                decoded = base64.b64decode(toc_string)
                try:
                    if 'csv' in tof:
                        # Assume that the user uploaded a CSV file
                        df1 = pd.read_csv(
                            io.StringIO(decoded.decode('utf-8')))
                    elif 'xls' in tof:
                        # Assume that the user uploaded an excel file
                        df1 = pd.read_excel(io.BytesIO(decoded))

                except Exception as e:
                    print(e)
                    return html.Div([
                        'There was an error processing this file.'
                    ])

                newdf= pd.DataFrame(df1).fillna(0)
                newdf.columns=['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']
                newdf['Entity']=newdf['Entity'].str.upper()
                newdf.loc[newdf['Asset Type'].str.contains('alts'), 'Asset Type'] = 'Equity/Alts'
                newdf.loc[newdf['Asset Type'].str.contains('equity'), 'Asset Type'] = 'Equity/Alts'
                newdf.loc[newdf['Asset Type'].str.contains('credit'), 'Asset Type'] = 'Credit ex Structured'
                newdf.loc[newdf['Asset Type'].str.contains('structured'), 'Asset Type'] = 'Structured'
                newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Tenor']='None'
                newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Rating']='None'
                    # newdf.iloc[newdf['Investment (mil)']] = '${:,.2f}'.format

                # print(newdf)
                return dash_table.DataTable(id='uploadtotable',
                            data= newdf.to_dict('records'),
                            columns=[{'name': i, 'id': i} for i in ['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']], 
                            page_action = 'none',
                            style_table = {'height':'250px','overflowY':'auto'},       
                            style_header_conditional= necessary,
                            style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
                            fixed_rows={'headers': True, 'data': 0},
                            css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]
                        )
    if fromc is not None:
                fromc_type, fromc_string = fromc.split(',')

                decoded = base64.b64decode(fromc_string)
                try:
                    if 'csv' in fromf:
                        # Assume that the user uploaded a CSV file
                        df1 = pd.read_csv(
                            io.StringIO(decoded.decode('utf-8')))
                    elif 'xls' in fromf:
                        # Assume that the user uploaded an excel file
                        df1 = pd.read_excel(io.BytesIO(decoded))

                except Exception as e:
                    print(e)
                    return html.Div([
                        'There was an error processing this file.'
                    ])

                newdf= pd.DataFrame(df1).fillna(0)
                newdf.columns=['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']
                newdf['Entity']=newdf['Entity'].str.upper()
                newdf.loc[newdf['Asset Type'].str.contains('alts'), 'Asset Type'] = 'Equity/Alts'
                newdf.loc[newdf['Asset Type'].str.contains('equity'), 'Asset Type'] = 'Equity/Alts'
                newdf.loc[newdf['Asset Type'].str.contains('credit'), 'Asset Type'] = 'Credit ex Structured'
                newdf.loc[newdf['Asset Type'].str.contains('structured'), 'Asset Type'] = 'Structured'
                newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Tenor']='None'
                newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Rating']='None'
                    # newdf.iloc[newdf['Investment (mil)']] = '${:,.2f}'.format

                # print(newdf)
                return dash_table.DataTable(id='uploadfromtable',
                            data= newdf.to_dict('records'),
                            columns=[{'name': i, 'id': i} for i in ['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']], 
                            page_action = 'none',
                            style_table = {'height':'250px','overflowY':'auto'},       
                            style_header_conditional= necessary,
                            style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
                            fixed_rows={'headers': True, 'data': 0},
                            css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]
                )

Hi there,

I can try to help you, let’s see if I understand it…

Am I correct to say that you need to parse the content differently depending on which callback (component) the file comes from? In this case, you could simply have an extra parameter in the function that takes where it comes from and use it in the logic inside the function.

In any case, I don’t think you want to have a single callback to handle both upload inputs. If you need to store the data after uploading, it might be easier to store in a dcc.Store component in each callback.

One more minor thing: it might be a better idea to test filename.endswith('.csv') than 'csv' in filename, to avoid gotchas like “not_a_csv.xls”.

This is because toc is a list, not a single string. You have to use the same listcomp from your first code snippet to make it work in each element of the list.

Thank you for your response! You are correct that I need to parse the content differently depending on which callback (component) the file comes from (which dcc.upload drop the user has uploaded data to as there are 2).

For my second code snippet, does this cover what you meant when you said to add logic into my function based on where it is coming from? I also don’t really understand what toc is a list but contents, from my first code snippet, is not? Sorry, I’m very new to coding and dash.

Thank you so much for your help so far.

In your first snippet, contents is the first argument of parse_content, which you assume to be a string. However, in your last snippet, toc is the first argument of the callback upload, which is the contents property from dcc.Upload. According to the documentation, it looks like this is a list of strings and that’s why you had in your first callback correctly:

    if from_list_of_contents is not None:
        children = [
            parse_contents(c, n) for c, n in
            zip(from_list_of_contents, from_list_of_names)]

So, you actually need to loop over toc to get each element, then toc_el.split() will work (toc_el is one element of toc).

For my second code snippet, does this cover what you meant when you said to add logic into my function based on where it is coming from?

What I meant is that if you have “to” and “from” type of files being updated, then you can define your parsing function like:

def parse_contents(contents, filename, content_type):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    
    # ... plus all the shebang to load as pd.Dataframe
    if content_type == "from":
         # do something with df 
    elif content_type == "to":
        # do something else, because it is type "to"

     return modified_df

So content_type can be different in each callback, depending on if it refers to the “upload-to” or “upload-from”. Does it make sense? Please note that this is not the callback function by itself, but a function that you will use inside it to transform the content to a pandas dataframe.

Ok I tried implementing your feedback and am now getting this error: TypeError (note: full exception trace is shown but execution is paused at: update_to_output)
zip argument #3 must support iteration

Please, let me know if this is at least the right direction with my code.

def parse_contents(contents, filename, content_type):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df1 = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df1 = pd.read_excel(io.BytesIO(decoded))

    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    newdf= pd.DataFrame(df1).fillna(0)
    newdf.columns=['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']
    newdf['Entity']=newdf['Entity'].str.upper()
    newdf.loc[newdf['Asset Type'].str.contains('alts'), 'Asset Type'] = 'Equity/Alts'
    newdf.loc[newdf['Asset Type'].str.contains('equity'), 'Asset Type'] = 'Equity/Alts'
    newdf.loc[newdf['Asset Type'].str.contains('credit'), 'Asset Type'] = 'Credit ex Structured'
    newdf.loc[newdf['Asset Type'].str.contains('structured'), 'Asset Type'] = 'Structured'
    newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Tenor']='None'
    newdf.loc[newdf['Asset Type'] == 'Equity/Alts', 'Rating']='None'
        # newdf.iloc[newdf['Investment (mil)']] = '${:,.2f}'.format

    if content_type == "from":
         # do something with df 
         return dash_table.DataTable(id='uploadfromtable',
                data= newdf.to_dict('records'),
                columns=[{'name': i, 'id': i} for i in [newdf.columns]], 
                page_action = 'none',
                style_table = {'height':'250px','overflowY':'auto'},       
                style_header_conditional= necessary,
                style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
                fixed_rows={'headers': True, 'data': 0},
                css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]
            )
    elif content_type == "to":
        # do something else, because it is type "to"
        return dash_table.DataTable(id='uploadtotable',
                data= newdf.to_dict('records'),
                columns=[{'name': i, 'id': i} for i in [newdf.columns]], 
                page_action = 'none',
                style_table = {'height':'250px','overflowY':'auto'},       
                style_header_conditional= necessary,
                style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
                fixed_rows={'headers': True, 'data': 0},
                css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]
            )
     
#import 
@CAD.callback(Output('output-to-data-upload', 'children'),
            Input('upload-to-data', 'contents'),
            State('upload-to-data', 'filename'),
            State('upload-to-data', 'content_type'),           
            )

def update_to_output(to_list_of_contents, to_list_of_names, to_list_type,):
    if to_list_of_contents is not None:
        to_list_type== 'to'
        children = [
            parse_contents(c, n, t) for c, n, t in
            zip(to_list_of_contents, to_list_of_names, to_list_type,)]
        return children 

@CAD.callback(Output('output-from-data-upload', 'children'),
            Input('upload-from-data', 'contents'),
            State('upload-from-data', 'filename'),
            State('upload-from-data', 'content_type'),
            )

def update_from_output(from_list_of_contents, from_list_of_names, from_list_type):
    if from_list_of_contents is not None:
        from_list_type == 'from'
        children = [
            parse_contents(c, n, t) for c, n, t in
            zip(from_list_of_contents, from_list_of_names, from_list_type)]
        return children 

You are redefining your to_list_type inside the callback and assigning a string to it. The version in the callback signature is a list, so that’s why it complains about not supporting iteration. You should replace it with a list of len equals to the len of all variables in the callback signature.

I don’t see the difference between the two return statements, so why do you need them if your parse function looks the same in both cases?

Edit: I should have mentioned that before, but I think what I called « content type « might be very misleading. It has nothing to do with the file format or type, just with from whcich callback is the function invoked (content origin can be a better name)

I want to parse differently because I need to be able to differentiate between the two tables so I can use the upload data correctly in my calculations.

But each table uses a different callback to be uploaded, right? So you know which table corresponds to each callback, regardless of how they are parsed.

Yes, but I can’t access the data element in those datatables that are returned to flow through my calcs. But I found that I can if I call on the table ID in parse contents, which is why I am trying to differentiate. But there may be a better way to do this?

I am a bit confused by the way that you are describing the problem, so let me try to articulate it differently to see if I can suggest you a better approach or clarify your current approach.

So, you have two Upload components A and B. Each component has its own callback to take the content, parse it and return to an Output. I will name df_a and df_b the Pandas dataframes coming out of the parsing function. They can be parsed by the same function, however they are not interchangeable in the calculation (from what I understood from your post).

Now, let’s say that you have a calculation you want to do using both df_a and df_b. If the calculation depends on both of them, then you need both df to exist, so you need to have them parsed individually in each Upload component. In this case, you could opt to have a single callback taking both contents from a and b as input, parsing them using the same function and making the calculation in the callback. So, as you had at some point:

@app.callback(
    Output('<whatever-is-your-output>', '<whatever-prop-you-are-returning>'),
    Input('upload-a', 'contents'),
    Input('upload-b', 'contents'),
    State('upload-a', 'filename'),
    State('upload-b', 'filename'),
)
def upload(content_a, content_b, filename_a, filename_b):
    
    if not content_a or not content_b:
         # Dont go ahead if content_a or content_b are None
         raise dash.exception.PreventUpdate
    
    # using the same parsing function
    # NOTE: they could be different, not a problem!
    df_a = parse_content(content_a, filename_a)
    df_b = parse_content(content_b, filename_b)

    # The callback knows which df is from which Upload component
    # ... so it is just a matter of calculate
    calculation_results = make_your_calculation(df_a, df_b)

    # your output can be anything (a valua, a table, a graph) depending on
    # how you want to show the results
    return calculation_results

Is this what you are trying to accomplish?

Sorry for not being clear.

Yes, basically we have df_a and df_b which are being parsed and returned as a datatable. I then take the data attribute in each table and want to flow it through a complex calculation function which performs some calcs and assignments to df_a and df_b before taking the difference to get the results. This calculation function finds whether the user has uploaded data or input data in the app and uses whatever the user has submitted in the calculation. To simplify this is what that looks like

def calc (n_clicks_a, n_clicks_df_a, n_clicks_b, n_clicks_df_b, input_a, upload_a, input_b, upload_b):
   if  n_clicks_a:
      df=input_a
   if  n_clicks_df_a:
     df= upload_a
#perform some calcs
   if  n_clicks_b:
      df1=input_b
   if  n_clicks_df_b:
     df1= upload_b
#perform some calcs
total=df-df1 #so really a-b

My issue is that I cannot grab the data attribute of the datatable after it has been returned via update_to_output or update_from_output but I can grab if I input into my callback for my calcs function the ‘uploadtable’ from parse_contents. This issue with this is that the table is the same for df_a and df_b, which doesn’t allow me to distinguish those values in my calc function and effectively 0’s out.

Ok, I think I understand now what you mean. Coming back to one of your callbacks:

@CAD.callback(
    Output('output-to-data-upload', 'children'),
    Output('output-from-data-upload', 'children'),
    Input('upload-to-data', 'contents'),
    Input('upload-from-data', 'contents'),
    State('upload-to-data', 'filename'),
    State('upload-from-data', 'filename'),
)
def upload(toc, fromc, tof, fromf):

This callback expects you to return two components to populate “output-to-data-upload” and “output-from-data-upload”, respectively. You ended up returning something like:

                return dash_table.DataTable(id='uploadtotable',
                            data= newdf.to_dict('records'),
                            columns=[{'name': i, 'id': i} for i in ['Entity', 'Asset Type', 'Asset Class', 'Rating', 'Tenor', 'Investment (mil)']], 
                            page_action = 'none',
                            style_table = {'height':'250px','overflowY':'auto'},       
                            style_header_conditional= necessary,
                            style_cell = {'textAlign':'center', 'minWidth': '150px', 'maxWidth':'250px', 'Width':'100px', 'font-family': 'PrudentialModern'},
                            fixed_rows={'headers': True, 'data': 0},
                            css=[{"selector": ".Select-menu-outer", "rule": "display: block !important"}]
                        )

This is just one table! :smiley: You should have returned two tables in the same return statement, not one then the other. In other words, you are returning uploadtotable to “output-to-data-upload” and None to “output-from-data-upload”.

So the fix is to create newdf_to and newdf_from and return a tuple with each dash_table.DataTable component respectively (to and from) and you are good to go.

Sorry that it took me so long to realize that your returns were messed up. You should then be able to grab the data prop from each dash_table.DataTable component and use as input in other callbacks.

Hope this solves the issue!

Alternatively, you can consider splitting in two callbacks, one for to and other for from. It would be less confusing.

Thank you so much for all of your help!! I really appreciate it.

1 Like