✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
🧬 Learn how to build RNA-Seq data apps with Python & Dash. Register for the May 20 Webinar!

Show and Tell - dash-uploader (Upload large files)

Hi all,

I had some problems with uploading large data files using Dash, and I bumped to the awesome package dash-resumable-upload (0.0.3) and the improved version (0.0.4) by github user westonkjones. I decided to build my own package based on these, and published it on GitHub at

It is also pip installable (pip install dash-uploader) . I tried to make the documentation clear, so it would be easy
for anyone using Dash to upload large data files. The size of the data file should be only limited by the hard disk drive.

A complete MWE with a callback after upload would look something like this


from pathlib import Path

import dash_uploader as du
import dash
import dash_html_components as html
from dash.dependencies import Input, Output

app = dash.Dash(__name__)

UPLOAD_FOLDER = r"C:\tmp\Uploads"
du.configure_upload(app, UPLOAD_FOLDER)

app.layout = html.Div(
    [
        html.H1('Demo'),
        html.Div(
            du.Upload(
                text='Drag and Drop files here',
                text_completed='Completed: ',
                pause_button=False,
                cancel_button=True,
                max_file_size=1800,  # 1800 Mb
                filetypes=['zip', 'rar'],
                css_id='upload-files-div',
            ),

            style={
                'textAlign': 'center',
                'width': '600px',
                'padding': '10px',
                'display': 'inline-block'
            },
        ),
        html.Div(id='callback-output')
    ],
    style={
        'textAlign': 'center',
    },

)

@app.callback(Output('callback-output', 'children'),
              [Input('upload-files-div', 'fileNames')])
def display_files(fileNames):

    if fileNames is not None:
        out = []
        for filename in fileNames:
            file = Path(UPLOAD_FOLDER) / filename
            out.append(file)
        return html.Ul([html.Li(str(x)) for x in out])
    return html.Ul(html.Li("No Files Uploaded Yet!"))

if __name__ == '__main__':
    app.run_server(debug=True)

And the resulting page will look something like this:

I have tested it with the Dash 1.11.0 and Python 3.7.2.

Hope that you like it!

2 Likes

v.0.1.1 Update

  • Bugfix: Now callback is called after every upload
  • The callback syntax is slightly different now, since now there is a isCompleted boolean flag that tells when the upload process is completed, and fileNames list (currently max length is one) that has the name(s) of the uploaded files. Seems that the fileNames itself can not be used as input, as appending files to the list did not trigger the callback.

A complete MWE with a callback on v.0.1.1 would be:

from pathlib import Path

import dash_uploader as du
import dash
import dash_html_components as html
from dash.dependencies import Input, Output, State

app = dash.Dash(__name__)

UPLOAD_FOLDER = r"C:\tmp\Uploads"
du.configure_upload(app, UPLOAD_FOLDER)

app.layout = html.Div(
    [
        html.H1('Demo'),
        html.Div(
            du.Upload(
                text='Drag and Drop files here',
                text_completed='Completed: ',
                pause_button=False,
                cancel_button=True,
                max_file_size=1800,  # 1800 Mb
                filetypes=['zip', 'rar'],
                css_id='upload-files-div',
            ),
            style={
                'textAlign': 'center',
                'width': '600px',
                'padding': '10px',
                'display': 'inline-block'
            },
        ),
        html.Div(id='callback-output')
    ],
    style={
        'textAlign': 'center',
    },
)


@app.callback(
    Output('callback-output', 'children'),
    [Input('upload-files-div', 'isCompleted')],
    [State('upload-files-div', 'fileNames')],
)
def display_files(isCompleted, fileNames):

    if not isCompleted:
        return
    if fileNames is not None:
        out = []
        for filename in fileNames:
            file = Path(UPLOAD_FOLDER) / filename
            out.append(file)
        return html.Ul([html.Li(str(x)) for x in out])
    return html.Ul(html.Li("No Files Uploaded Yet!"))


if __name__ == '__main__':
    app.run_server(debug=True)
1 Like

Sounds awesome.
First observation while trying it out: You’ve been very restrictive on the requirements which leads to downgrading all my dash packages. Is there a specific reason for this or could you open those up like e.g. dash>=1.11?

Thanks for trying out and giving feedback! You’re right the requirements on v0.1.1. were very restrictive. I changed the dash requirement to be dash>=1.1.0, but it might work on older dash versions, too. I updated the requirements in the v.0.1.2. update. I included also a progressbar:

3 Likes

Now there’s v.0.2.0, update available which includes

  • Upload folder for each file defined with a upload id, which may be defined by the user. In the example below, a simple uuid.uuid1() is used, which can be also converted back to a timestamp. This way, uploads with same filename from concurrent users will be in different folders.
  • Bugfix: Uploading file with similar name now overwrites the old file (previously, file chunks were uploaded, but never merged.)

2 Likes

:tada: dash-uploader v.0.3.0

Im am happy to announce you the v.0.3.0 release of dash-uploader!

What is new?

  • Added proper Documentation page for the package.
  • Added new @du.callback decorator for simple callback creation.
  • Added (experimental) max_files support for the du.Upload component: Now it is possible to drag multiple files to the upload component.
  • Bugfix: Working behind a proxy now possible. I.e., if app is run at
    http://myhost/myproxy , and dash instance is initiated with app = dash.Dash( __name__, requests_pathname_prefix='myproxy', )
    it is taken care out-of-the-box.
  • Bugfix: Uploading files with same filename repeatedly is now possible.

The new callback decorator @du.callback greatly simplifies (the simple) callbacks; instead of the

@app.callback(
    Output('callback-output', 'children'),
    [Input('dash-uploader', 'isCompleted')],
    [State('dash-uploader', 'fileNames'),
     State('dash-uploader', 'upload_id')],
)
def callback_on_completion(iscompleted, filenames, upload_id):
    if not iscompleted:
        return

    out = []
    if filenames is not None:
        if upload_id:
            root_folder = Path(UPLOAD_FOLDER_ROOT) / upload_id
        else:
            root_folder = Path(UPLOAD_FOLDER_ROOT)

        for filename in filenames:
            file = root_folder / filename
            out.append(file)
        return html.Ul([html.Li(str(x)) for x in out])

    return html.Div("No Files Uploaded Yet!")

you may just use

@du.callback(
    output=Output('callback-output', 'children'),
    id='dash-uploader',
)
def get_a_list(filenames):
    return html.Ul([html.Li(filenames)])
1 Like

Hi!
I’ve noticed that the uploader doesn’t work if I initialize the app with “url_base_pathname” parameter, is there any specific reason for that?
I just replaced app initialization in usage demo for this:

app = dash.Dash(__name__,
                url_base_pathname='/page/'
                )

…and it stopped working. Maybe configure_upload function should be called differently or…?

Hi @Anker,

Thanks for the notice! I created issue for this here and I think I managed to fix it in dash-uploader 0.4.0.

Hi @np8,
Any plans to implement renaming of the uploaded file?

Hi @Shane20, thanks for asking! I think the best way to rename uploaded files would be to add the renaming functionality to the callback. What do you think?

Hi @np8

I have a feature in my app where I need to upload some files and also log the time taken to upload them.

From what I understand dash-uploader callback is called after the file has been uploaded and I couldn’t find a property which gives the time taken for upload. Any suggestions?

Tags

Hi @Aniqa295,

Interesting question. You are right the callbacks are called just after the file has been uploaded, but there is no “callback” or “hook” for the upload start event which would make it possible to calculate time used for upload. Unfortunately this is a not a feature that exists out of the box, but surely would be possible with some modifications in the core.

If you really want to calculate the time used for the upload, you can fork dash-uploader and either

  • Create logic that saves a timestamp of the first uploaded chunk of each file somehow. Maybe to some centralized file? Or use queues (like RabbitMQ or such). Or use a timestamp file for each uploaded file.
  • Create a logic that makes it possible to add custom “hook function” for start-of-upload action. In this case it could be probably also merged back to dash-uploader package.

How you would get started would be to fork dash-uploader and follow the CONTRIBUTING.md on instructions on how to setup the development environment. Then, you would need to modify the logic in the configure_upload.py. There is the decorate_server() function which is used to configure the Flask server so it knows what to do when a HTTP POST (or GET) request is directed to the dash-uploader upload URL. You can see that the chunks are written one by one to the filesystem and then combined later to form the uploaded file. You could then add your own logic for the case when the chunk_number is 1 or when the file is completely uploaded.

If I had more time or a real need myself, I would refactor the dash-uploader a bit to work with classes, and users of the package could just subclass some DashUploader class, and add their own logic for example for handling GET or POST requests. Something like how Django framework works.

I hope this helps you forward!

Hey

I recently installed this cool package! One question… How can I upload the file I want without that folder with random letters and numbers?
Thanks in advance.

You can do that by using use_upload_id=False, as mentioned in the docs

du.configure_upload(app, folder, use_upload_id=False)

This will work well only if you have never multiple simultaneous users in your application, since all files will be uploaded to the same folder, and if two users are uploading a file with same name, the results will be undefined. If you need something like this in multiuser app, then I would just combine the uploaded files into some folder inside a callback.

1 Like

Thank you very much! I read the docs but I did not understand the explanation.

Did you try to make the argument use_upload_id=False for the du.configure_upload call?

Hi,
first of all thanks a lot for your uploader. It looks awesome. It’s exactly the ting I needed because my file size was too big for dcc.upload.

But I have one question, is there a way to upload a folder structure with subfolders for example in a zip archive and then interact with this folder structure through a table in the dashboard? To clarify, I want to upload a lot of files and then manually select, which files should be displayed in a graph. When I look through your examples, you only refer to the uploaded files as a whole like when you use fileNames.

I hope you understood my problem and give me a hint how I could solve it. Thanks a lot

Currently the best way to do this kind of thing is - like you said - upload a .zip archive. That is, upload your folder structure as a single file. Then, inside the callback do the required processing for your file. In your case, it would include unzipping the file and maybe updating some GUI components :slight_smile:

Ah ok that sounds reasonable. Do you know any documentation I can read through to get some info how to handle zip files inside the callback? In the official dash documentation I didn’t find anything like that. I need to at least unzip the file and connect the filenames with a table with checkboxes or something

@sidneywilke I haven’t been using dash for a while but for callbacks I would start from here. Unzipping files can be done with the zipfile standard library module. Is there some particular part in the callbacks which is problematic for you?