Best Way to Handle User-Uploaded Dataset

Hi there!

I am new to Dash and have begun creating my own Dash App. The app takes in a csv file supplied by the user using dcc.Upload. Ideally, the use case is to have multiple dcc.Graphs generated automatically upon uploading and processing of the file.

My current implementation is to process the csv file using Apache Spark and store the results globally, using a callback:

@app.callback(
    Output('placeholder-div', 'children'),
    [Input('upload', 'contents')])
def analyze_csv(contents):
    # Perform a bunch of analysis
    global results 
    results = run_analysis(...) # Some analysis of parsed `contents`
    return ''

Is there a way to assign the global variable results to a State, to automatically trigger callbacks to update dcc.Graphs’ figures once the data has been processed?

Given that results is different from contents, the alternative implementation of calling the function run_analysis(...) in each callback whenever a file is uploaded would be extremely costly.

Hope to hear your feedback!

Thanks in advance,
Chris

3 Likes

Unfortunately, global variables aren’t safe to use in Dash for a couple of reasons:
1 - Mutations of global variable will persist across sessions causing one user’s session to influence the next user’s session.
2 - The global variable mutations will not persist across python processes. It is necessary to run Dash across multiple processes (with something like gunicorn) in production environments so that Dash can handle more than 1 request at once.

However, there are a couple of workarounds for this workflow. See Working on large datasets -- comparison with shiny for more details

First of all thanks for plot.ly Dash, it’s a great platform and I particularly appreciate the nice documentation!

My question follows up on the initial one - how to actually be able to use table data that has been uploaded via the dcc.Upload component combined with dash_table_experiments.

Reading in a local Excel file within the script as a data frame and creating a table based on this data, I was able to filter the table or use table row data to create plots. However, trying to replicate the same thing but uploading the data via dcc.Upload, I don’t seem to be able to actually access and work with the uploaded table data.
In the simple example below, I’d like to be able to filter the table data eventually using groupby etc. and in a first step would like to create dropdown options based on the uploaded columns. With the old way of reading in data I could simply use Input('table', 'rows') in callbacks and transfer table rows into a df. Now this does not seem to work, neither using Input nor State - in any case the dropdown options do not update and remain empty.

Having the great option of directly uploading the data, I think it would be crucial to actually work with it, so would be great if anyone had some ideas!

Regarding the discussion above, I understand that global variables aren’t save, however in the case of uploading data it may be useful to be able to make this uploaded data (and only this) an initial “global” df that can then be accessed as usual by filtering/callback functions. Trying to achieve this in my trials, I ran into quite a few problems with callbacks not being happy with an initially empty df that then gets populated with the uploaded data, for example.
So, as much as I like the dcc.Upload option, I feel the uploaded data cannot be accessed properly at this stage or am I missing something?

Any help would be much appreciated!

Example:

import dash_html_components as html
import dash_core_components as dcc
import dash

import plotly
import dash_table_experiments as dte
from dash.dependencies import Input, Output, State

import pandas as pd
import numpy as np

import json
import datetime
import operator
import os

import base64
import io



app = dash.Dash()

app.scripts.config.serve_locally = True


app.layout = html.Div([


html.H5("Upload Files"),
dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        multiple=True),


html.Br(),
html.Button(id='propagate-button', n_clicks=0, children='Propagate Table Data'),


html.Br(),
html.H5("Filter Column"),
dcc.Dropdown(id='dropdown_table_filterColumn',
    multi = False,
    placeholder='Filter Column'),


html.Br(),
html.H5("Updated Table"),
html.Div(id='output-data-upload'),
html.Div(dte.DataTable(rows=[{}], id='table'), style={'display': 'none'})


])


# Functions

# file upload function
def parse_contents(contents, filename, date):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))

    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),
        html.H6(datetime.datetime.fromtimestamp(date)),

        # Use the DataTable prototype component:
        # github.com/plotly/dash-table-experiments
        dte.DataTable(rows=df.to_dict('records')),

        html.Hr(),  # horizontal line

        # For debugging, display the raw contents provided by the web browser
        html.Div('Raw Content'),
        html.Pre(contents[0:200] + '...', style={
            'whiteSpace': 'pre-wrap',
            'wordBreak': 'break-all',
            'whiteSpace': 'normal'
        })
    ])



# Callbacks

# callback table creation
@app.callback(Output('output-data-upload', 'children'),
              [Input('upload-data', 'contents'),
               Input('upload-data', 'filename'),
               Input('upload-data', 'last_modified')])

def update_output(list_of_contents, list_of_names, list_of_dates):

    if list_of_contents is not None:
        children = [
            parse_contents(c, n, d) for c, n, d in
            zip(list_of_contents, list_of_names, list_of_dates)]
        return children



#callback update options of filter dropdown
@app.callback(Output('dropdown_table_filterColumn', 'options'),
              [Input('propagate-button', 'n_clicks'),
               Input('table', 'rows')])

def update_filter_column_options(n_clicks_update, tablerows):

    if n_clicks_update < 1:
        print "df empty"

        return []

    else:
        dff = pd.DataFrame(tablerows) # <- problem! dff stays empty even though table was uploaded

        print "updating... dff empty?:", dff.empty #result is True, labels stay empty

        return [{'label': i, 'value': i} for i in sorted(list(dff))]


# simpler option, does not work either:
#@app.callback(Output('dropdown_table_filterColumn', 'options'),
#              [Input('table', 'rows')])
#
#def update_filter_column_options(tablerows):
#    
#    dff = pd.DataFrame(tablerows)
#    print "updating... dff empty?:", dff.empty
#
#    return [{'label': i, 'value': i} for i in sorted(list(dff))] # <- problem! dff stays empty even though table was uploaded



app.css.append_css({
    "external_url": "https://codepen.io/chriddyp/pen/bWLwgP.css"
})

if __name__ == '__main__':
    app.run_server(debug=True)

@jlbgit - It looks like the callbacks are listening to the wrong DataTable instance. Your parse_contents function is returning a visible DataTable but it doesn’t have an id. The DataTable inside app.layout has an ID (table) but that component isn’t actually used as the output (instead, the output-data-upload div is being used as an output).

I’ve fixed this issues in your example here, and it seems to work for me:

import dash_html_components as html
import dash_core_components as dcc
import dash

import plotly
import dash_table_experiments as dte
from dash.dependencies import Input, Output, State

import pandas as pd
import numpy as np

import json
import datetime
import operator
import os

import base64
import io



app = dash.Dash()

app.scripts.config.serve_locally = True
app.config['suppress_callback_exceptions'] = True

app.layout = html.Div([

    html.H5("Upload Files"),
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        multiple=False),
    html.Br(),
    html.Button(
        id='propagate-button',
        n_clicks=0,
        children='Propagate Table Data'
    ),


    html.Br(),
    html.H5("Filter Column"),
    dcc.Dropdown(id='dropdown_table_filterColumn',
        multi = False,
        placeholder='Filter Column'),


    html.Br(),
    html.H5("Updated Table"),
    html.Div(dte.DataTable(rows=[{}], id='table'))


])


# Functions

# file upload function
def parse_contents(contents, filename):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))

    except Exception as e:
        print(e)
        return None

    return df


# callback table creation
@app.callback(Output('table', 'rows'),
              [Input('upload-data', 'contents'),
               Input('upload-data', 'filename')])
def update_output(contents, filename):
    if contents is not None:
        df = parse_contents(contents, filename)
        if df is not None:
            return df.to_dict('records')
        else:
            return [{}]
    else:
        return [{}]


#callback update options of filter dropdown
@app.callback(Output('dropdown_table_filterColumn', 'options'),
              [Input('propagate-button', 'n_clicks'),
               Input('table', 'rows')])
def update_filter_column_options(n_clicks_update, tablerows):
    if n_clicks_update < 1:
        print "df empty"
        return []

    else:
        dff = pd.DataFrame(tablerows) # <- problem! dff stays empty even though table was uploaded

        print "updating... dff empty?:", dff.empty #result is True, labels stay empty

        return [{'label': i, 'value': i} for i in sorted(list(dff))]


app.css.append_css({
    "external_url": "https://codepen.io/chriddyp/pen/bWLwgP.css"
})

if __name__ == '__main__':
    app.run_server(debug=True)
1 Like

@chriddyp
Many thanks for the quick reply and thanks a lot for this solution! I couldn’t think of updating the Output('table', 'rows') directly - very neat.
That saved the day :slight_smile:

Thanks again, great support in this forum!

1 Like

@chriddyp I am kind of a newbie to this whole web app thing so I’m sorry if the question is quite silly. I have a UFF file “.uff” that holds some scientific data. Python has a library called pyUFF that allows to parse the data in it in a very good way. Is there a way to upload a file and just retrieve it as it is and parse with pyuff library?

The command used takes as input just the file path

unv_file = pyuff.UFF(fileName=file_path)

Thanks!

1 Like

@chriddyp Thanks for the response to many topics. I think my question is relevant here. I am trying to create a dropdown list based on the columns of the uploaded file. And then use both the uploaded file’s contents and the user selected value from the dropdown list to trigger other calculations. But the calculations are trigger right away when I upload the file, before I select a value from the dropdown list created based on the uploaded file. When I select a value from the dropdown list, no more calculation is triggered. I copied my code here (also tried to convert the uploaded file into ‘tablerows’ like your answer but kept getting ‘error uploading dependencies’. Can you please help. Thank you in advance.

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

def parse_contents(contents, filename):
content_type, content_string = contents.split(’,’)

 decoded = base64.b64decode(content_string)
 try:
      if 'csv' in filename:
          df = pd.read_csv( io.StringIO(decoded.decode('utf-8')))
      elif 'xlsx' in filename:
          df = pd.read_excel(io.BytesIO(decoded))

 except Exception as e:
       print(e)
       return None
 return df

def create_dropdown_list(df):

  metric_name_list=[col for col in df.columns if '_name' in col]
  metric_level_list=[w.replace('_name', '') for w in metric_name_list]
  dropdown_list=[{'label': key, 'value': value} for (key,value) in zip(metric_level_list,metric_name_list)]

  return dropdown_list    

app = dash.Dash()

app.layout = html.Div(
[
html.Div([
dcc.Upload(
id=‘upload_data’,
multiple=False,
style={‘width’: 105, ‘display’: ‘inline-block’, ‘margin-left’: 7},
children=html.Button(‘Select File’),
),
],style={‘width’: ‘20%’, ‘display’: ‘inline-block’}),

     html.Div(id='dimension-dropdown'),


     html.Hr(),
    

     html.Div(id='display-selected-values')
]

)

app.config.supress_callback_exceptions = True

@app.callback(Output(‘dimension-dropdown’, ‘children’),
[Input(‘upload_data’, ‘contents’)],
[State(‘upload_data’, ‘filename’)])
def update_dropdown(contents, filename):
if not contents or not filename:
return
df = parse_contents(contents, filename)
dropdown_op=create_dropdown_list(df)
return dcc.Dropdown(
#id=‘test’,
options=dropdown_op
)

@app.callback(
dash.dependencies.Output(‘display-selected-values’, ‘children’),
[Input(‘dimension-dropdown’, ‘value’),
Input(‘upload_data’, ‘contents’)],
[
State(‘upload_data’, ‘filename’)
])
def set_test_msg(selected_dimension,
contents,
filename):
#if not selected_dimension :
# return
return u’filename is {} , selected dimensions: {}’.format(
filename,selected_dimension

)

if name == ‘main’:
app.run_server()

A post was split to a new topic: How to read an opus file?

I want to create a dash app with an upload button then another button to trigger conversion then a download button.
For example if the uploaded file has two columns A and B the conversion button should add two columns C and D then the values of C=A+B and D=A-C , so on clicking the conversion button i should have a table with four columns A,B,C and D then with the download button i should be able to download the converted table. Anyway i can work out this? kindly help

Is def parse_contents a reserved function for file uploads? The function name has to be the same?