Dash Upload Component - Decoding Large Files

maulberto3 · July 2, 2020, 8:45pm

Quick question @fohrloop , does it use the hidden div method?

fohrloop · July 2, 2020, 8:57pm

@maulberto3 no, the dash-uploader does not upload the data into the browser at all, but uses HTTP POST requests to send the file to the webserver. This way you can send file of even several Gigabytes without worrying that your browser would get out of memory.

maulberto3 · July 3, 2020, 2:22pm

Hi @fohrloop Would that upload be visible to all other users accesing the app? Or just the uploading user?

fohrloop · July 3, 2020, 4:38pm

@maulberto3 The uploaded files are saved to the hard disk of the computer that is running your dash application. So, you may decide yourself who can access the files through your app

maulberto3 · July 3, 2020, 5:07pm

@fohrloop Thank you for your patience. Will do some tests soon. Thanks again.

skinner4dinner · September 29, 2020, 4:30pm

@fohrloop thanks for the excellent work in your dash-uploader component.

I’m new to Dash and am trying to implement dash-uploader into a project, and am hoping someone more experienced can help.

I’d like to upload multiple (a few to 100+) large (7MB) csv files to the dashboard, then concatenate the contents of all csv files into one large Pandas dataframe, and finally create various visualizations (which is well-covered by Dash documentation).

dcc.upload looks like it works for small files, but won’t be too useful for this application. dash-uploader looks like a good path forward.

Here is my working code, which works reasonably well. It uploads csv files, concatenates them vertically, and then outputs a data table. There are three problems/questions with it:

I can’t figure out how to get the dataframe df outside of the get_a_list() function. I would like to have this dataframe defined outside of a function so that I can use it in dropdowns and other dcc components. I think I need to upload the files first in the get_a_list() function, and when that is done running, set df equal to another function that concatenates the csv files. But I can’t get it to work.
I’m not sure how this dash-uploader component will work if I deploy this via Heroku. It uploads the files to a folder on the host computer, and hopefully these files will upload to the Heroku server. Then I use the os.getcwd() command to obtain the current path where the files were presumably just uploaded.
I found that the time.sleep(2) command on line 64 was needed in order to give the system some extra time to finish uploading the files. Not sure if there is a more elegant solution. This will probably be solved if I can get a make_dataframe() function separated from the get_a_list() function.

# uploadconcat.py

from pathlib import Path
import uuid
import time
import dash_uploader as du
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output, State
import pandas as pd
import os
from glob import glob
import dash_table
import numpy as np
import plotly.graph_objs as go

app = dash.Dash(__name__)


UPLOAD_FOLDER_ROOT = 'uploads'
du.configure_upload(app, UPLOAD_FOLDER_ROOT)

def get_upload_component(id):
    return du.Upload(
        id=id,
        max_file_size=1800,  # 1800 Mb
        filetypes=['csv'],
        # text_completed='',
        upload_id = 'user1',
        # upload_id=uuid.uuid1(),  # Unique session id
    )

def get_app_layout():
    return html.Div(
        [
            html.H1('Page Title'),
            html.Div(
                [
                    html.Div(id='callback-output'),
                    get_upload_component(id='dash-uploader')
                ],
                style={  # wrapper div style
                    'textAlign': 'center',
                    'width': '1000px',
                    'padding': '10px',
                    'display': 'inline-block'
                }),
        ],
        style={
            'textAlign': 'center',
        },
    )

# get_app_layout is a function
# This way we can use unique session id's as upload_id's
app.layout = get_app_layout

@du.callback(
    output=Output('callback-output', 'children'),
    id='dash-uploader',
)
def get_a_list(filenames):
    time.sleep(2) # This is needed to let the files finish uploading
    path = os.getcwd()
    stock_files = sorted(glob((path + '/' + filenames[0])[:-9] + '*.csv'))

    # create dataframe from uploaded csvs:
    df = pd.concat((pd.read_csv(file)
                      for file in stock_files), ignore_index = True)

    return dash_table.DataTable(
            data=df.to_dict('records'),
            columns=[{'name': i, 'id': i} for i in df.columns])

if __name__ == '__main__':
    app.run_server(debug=True)

file1.csv:

file2.csv:

dashboard output:

Again, this works but has the stated limitations which I would like to solve.

Please let me know if you have any suggestions on a path forward for any of these questions, or suggestions for additional resources. Thanks!

maulberto3 · September 29, 2020, 4:54pm

Maybe @fohrloop could shed some light. Interested in heroku deployment capabilities as well.

fohrloop · October 27, 2020, 7:38pm

Hi @skinner4dinner, @maulberto3 and apologies for a late response. I check this forum only occasionally, but now turned on the email notifications.

1. Getting dataframe out of get_a_list()

I’m not exactly sure what you want, but maybe something like this:

def get_a_list(user_id):
    stock_files = []
    folder = Path(UPLOAD_FOLDER_ROOT) / user_id

    for csv_file in folder.glob('*.csv'):
        if csv_file.is_file():
            stock_files.append(csv_file)

    # create dataframe from uploaded csvs:
    df = pd.concat((pd.read_csv(file) for file in stock_files),
                   ignore_index=True)

    return dash_table.DataTable(data=df.to_dict('records'),
                                columns=[{
                                    'name': i,
                                    'id': i
                                } for i in df.columns])


@du.callback(
    output=Output('callback-output', 'children'),
    id='dash-uploader',
)
def call_after_file_upload(filenames):
    return get_a_list(user_id=Path(filenames[0]).parent.name)

Now you can also just use pass in call_after_file_upload and call get_a_list with the user_id from callback of some another component. Note that the HTTP requests are not “checked” in any way in dash-uploader, so anyone can get anyones tables visible (by guessing the user id), if you do not create some sort of login and user authentication system yourself to your app.
If you want to know the user_id in the callback for du.Upload, then you may want to use the @app.callback method described here . It is more verbose but gives you a bit more control (access to user_id in callback).

2. dash-uploader on Heroku

I have never tried Heroku myself. The dash-uploader needs (currently) to have write permissions to the hard disk on the server that it is running on. Your current code just uploads to the same directory as the code is, under uploads, since you have given UPLOAD_FOLDER_ROOT = 'uploads'. Maybe test it out first, how it works. I do not know how much disk space Heroku gives.

If you want to try to improve the dash-uploader and add some other way to save the data, you can check it out. It is not too complicated to setup the development environment for dash-uploader (see here). If I would like to change the way the files are saved, I would start at the configure_upload.py.

3. About time.sleep

I did not encounter this myself, but If can be reproduced, then it might be good idea to actually create an issue for it. It could be possible to add some sort of “while file is not ready, wait” there to ensure that the file is on filesystem before callback is fired.

I hope this helps!

-Niko

fohrloop · October 29, 2020, 3:05pm

@skinner4dinner, @maulberto3

FYI: There is some discussion on using dash-uploader in Heroku in this GitHub Issue. If you’re interested to take part into debugging and finding a solution, you are welcome!

maulberto3 · October 31, 2020, 5:02am

@fohrloop Hi thank you for the message, most welcomed, will take a look at it and see where I can help.

Topic		Replies	Views
Upload Component file max size limited to 200 MB on Firefox Dash Python	1	869	April 17, 2020
Issue with upload component and .txt file Dash Python	1	433	October 16, 2020
How to speedup DASH upload component? Dash Python	4	894	March 17, 2019
Show and Tell - dash-uploader (Upload large files) Dash Python show-and-tell	51	12083	March 6, 2024
Dash Upload - Uploading a Text File Dash Python	2	5168	March 6, 2018

Dash Upload Component - Decoding Large Files

1. Getting dataframe out of get_a_list()

2. dash-uploader on Heroku

3. About time.sleep

Related topics