Dash Upload Component - Decoding Large Files

Thanks for all the attention here. It’s very helpful.

My use case: I’m doing something which may not be ideal for dash… I’m making a small application which ingests a dataset from the user and then performs some fancy machine learning tasks to analyze user’s submitted data. The program populates a number of interactive plots. These plots also have parameters that can change and require updating. There is a data flow hierarchy here so the reactivity of the whole thing is a nice paradigm. And because my computational backend is python it seemed natural to try to use dash.

In the future this program would likely be useful if deployed on a server on the cloud. In that case the user would be uploading his data to some persistent filesystem. However, I do not foresee a usecase in which several users are being served the same content concurrently. Even on the cloud it would be an istance-per-user type application.

For an MVP all I would really want is the ability for the user to go through a kind of ‘file selection’ button in order to search their local filesystem for a datafile and for that file (address or handle) be passed off to the backend for processing – not copied and encoded on the server. I’m not very knowledgeable about this stuff, but I think I’ve read that HTML5 exposes this kind of capability in the file API, but I think it hasn’t been translated through to the dash API. Correct me if I’m wrong. Thanks in advance.

1 Like

Note that if the user is also running the app, then you can just display a dropdown with a list of the user’s files that they could select. You could just populate this dropdown’s options by listing files using glob.glob or something like that. psuedocode:

app.layout = html.Div([
     dcc.Dropdown(id='file-list', options=[{'label': i, 'value': i} for i in glob.glob('*.csv')])
])

This won’t work in a client-server pattern (where the app is running on a different machine than the user who is viewing it) but it would work fine for development.

1 Like

To make @chriddyp’s idea for an MVP support a client server setup, you could also embed an iframe within your dash app that points to a page with a super simple vanilla html upload form, thereby escaping the Dash app just for the upload. Then you’d need a ‘refresh’ button or some such which triggers a callback that updates the dropdown with the values of the filesystem glob now that the file has been uploaded.

Although now you’re just bypassing the Upload component entirely, so you obviously lose all the nice features it supports.

See Show And Tell -- Dash Resumable Upload for a new approach to uploading large files :tada:

Did you get any success in building the intended app @sophotrope? I am trying to build something similar, and it would be great to know if your application was successfully built? If yes, then could you share all key learning from your project?

Thanks in advance for all the help!

AP

The idea of using flask instead of dash works for me. I have tested for 1GB file. Here is my code:

from os.path import join
from flask import Flask, flash, request
from werkzeug.utils import secure_filename
import dash
import dash_html_components as html
import os
cwd = os.getcwd()
UPLOAD_FOLDER = cwd + '\\www'
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

app.layout = html.Div( 
        children=[
            html.Iframe(id='iframe-upload',src=f'/upload'),
            html.Div(id='output')
                ]
)

@app.server.route('/upload', methods=['GET', 'POST'])
def upload_file():
    if request.method == 'POST':
        file = request.files['file']
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))

    return '''
    <form method=post enctype=multipart/form-data>
      <input type=file name=file>
      <input type=submit value=Upload>
    </form>
    '''
if __name__ == '__main__':
   app.run_server(debug=True)

A new issue for this solution is how to make the flask part interactive with Dash components. For example, how to show the name of the uploaded file in ‘output’ block. I can do this by monitoring the filenames in the upload folder, but there should be some more straightforward way.

2 Likes

Hello,

I’m currently writing my masters’ thesis in the field of Topological Data Analysis in combination with the classification of observations. Comprised is the implementation of the interactive analytics process as a dashboard web application.

The user should be allowed to upload a csv file containing the required data. As the whole process is implemented in Python, I have chosen Dash to implement the user interface. I use three datasets in my thesis, consisting 1k resp. 280k and 800k observations. The size of the two large files is around 150MB resp. 210MB.

Problems occur regarding the upload of the large files to show the first 10 records for feature selection. To make the dataset usable for the application, I use read_csv of the pandas module to read the dataset into a pandas dataframe. This dataframe is serialized as a JSON string and saved in a hidden Div. The JSON serialization of the dataframe is stored in the browsers’ cache. Uploading one of the larger files, an Out of Memory Error in my browser is occurring and the app doesn’t change. This behavior seems to be independent from the web browser used.

The first approach to solve my problem was to replace the upload component from the module dash-core-components by the upload component of the module dash-resumable-upload, which didn’t solve the problem. The out of memory error isn’t occurring but the website is not updated with a preview of the first ten records. The next approach was to use a flask cache on the filesystem but the problem persists and the browser shows an Out of Memory Error. The last approach to get rid of my problem was to replace the hidden Div by a Store component of the dash-core-components module, which didn’t solve my problem, too. If I use just a subset of 50k observations everything works fine.

What could I do that Dash will work with a larger amount of data? Thank you very much in advance for your responses.

Has the Problem of large dataset handling being Solved by Now ?.. We have used sql for data handling and its working fine, but when it comes to larger dataset greater than 50 MB same issues are faced. The web browser crashes. Could you update on this please ?

1 Like

I have did the same think,
but the data I am taking is from local SQL database server and not from cloud.
Like @sophotrope said We need to use cloud instead of local SQL server.
will it be possible to do it?

Hey, just wanted a clarification.

So utilizing flask to save the file…does that change the type of file that is being uploaded? So will ‘Sample.xlsx’ still be ‘Sample.xlsx’? And how do you call for use later?

Thanks,

I am also wondering if there has been any update on uploading large files with Dash without them being fully in memory in both the browser and server. I did see some of the other threads about dash-resumable-upload, but it appears that is no longer being maintained.

Thanks!

1 Like

Is there any update on uploading large files (~ > 150MB) using dcc.Uplaod ?
Is there any way to come over this problem ?

1 Like

@jauerb Any luck finding a solution?

@maulberto3 I have made an upload component that can handle large files without problems. It is a fork of the dash-resumable-upload which utilizes Resumable.js. You can check the documentation on the github page: dash-uploader

Installing

pip install dash-uploader

Simple Example

import dash
import dash_html_components as html
import dash_uploader as du

app = dash.Dash(__name__)

# 1) configure the upload folder
du.configure_upload(app, r"C:\tmp\Uploads")

# 2) Use the Upload component
app.layout = html.Div([
    du.Upload(),
])

if __name__ == '__main__':
    app.run_server(debug=True)
2 Likes

Hi @fohrloop Thanks, will definitely look at it. Thanks again.

Quick question @fohrloop , does it use the hidden div method?

@maulberto3 no, the dash-uploader does not upload the data into the browser at all, but uses HTTP POST requests to send the file to the webserver. This way you can send file of even several Gigabytes without worrying that your browser would get out of memory.

1 Like

Hi @fohrloop Would that upload be visible to all other users accesing the app? Or just the uploading user?

@maulberto3 The uploaded files are saved to the hard disk of the computer that is running your dash application. So, you may decide yourself who can access the files through your app :slight_smile:

@fohrloop Thank you for your patience. Will do some tests soon. Thanks again.