Uploading Relatively Small Files is taking too Long in Dash Plotly

Hey,

I’m developing a Dashboard with plotly dash for my company. At the first Tab, the user has to upload three files. One of these files has 6Mb and is a xls file. I have used the parse_content style, as discussed here in another topic.

However, it takes too long to upload the data. Don’t know why this is happening. I’m saving the data using the Hidden Div method. Could it be slowing down my upload? As far as my understanding goes, the hidden div would only slow down callback functions, since it has to transfer the data through the network at each iteraction. But it isn’t taking too long to do so. I believe my major problem is in the Upload Button.

I have tried out the dash-resumable-upload package but when I add the Upload the dash just don’t appear at the screen. It turns my app into a blank screen.

From what I’ve read, upload files is truly a Dash’s limitation, even for small data. Any ideas of how I can make it faster to upload the data?

Ps: I’m not interested in linking my app with a amazon S3 href, since the app is only a temporary solution while we are developing a fully functional web server.

My code is pretty similar to the one found in here.

I believe uploading large amount of data should be a priority for Plotly Dashboards since it expands the range of its possible applications. However I don’t have the needed skills to develop such tools, but I’m interested in helping with whatever I could to develop such tool.

Thanks in advance to y’all. I’m new in this comumunity and I’m having a blast reading the Topics.

Hope I can contribute in further discussions.

Could it be an option to save data in a file on disk on the server?

I don’t think so. The problem is that every month the user will upload a new table with the same size (or close). The idea is to make the app the most user friendly as possible, so the user won’t need to worry about the data format or anything else. He will just upload their company format excel file and the dash will do the rest.

I know it seems a little too much for dash alone, but since the data isn’t that big and the analysis is quite simply, I’m focusing on the best environment for the user.

Hi @pharaujo welcome to the forum! Could you please give here a few details

  • how long does the upload take?
  • where is the app hosted? On a web server of your company?
  • did you try uploading a file somewhere else (eg on Google drive or preferably on the same server where the app is hosted) to be sure it’s not just the network which is limiting?

Hi @Emmanuelle. First of all, thanks for your reply and assistance.

We are hosting the app in a ec2 server. However, our main idea is to host it in Heroku, since we can do it for free. The problem is that Heroku timeout limit is 30 seconds, so we can’t estimate how long it takes in Heroku.

In ec2 server it takes around 1 minute with a good internet connection. I also uploaded the file in Google Drive and it takes less than 10 seconds (with the same internet connection).

@pharaujo I am aware that dash must take care of the upload. I was merely suggesting that you upload the file contents to a file on server (via dash), and access on directly on the server (rather than using the contents as input in a callback). As i understand, this approach would result in the least amount of web traffic. Here is a small example to show what i mean,

import base64
import os

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
from dash.exceptions import PreventUpdate

# Bind some folder on server for storing the data.
folder_on_server = "app_data"
os.makedirs(folder_on_server, exist_ok=True)
# Create the app.
app = dash.Dash(__name__)
app.layout = html.Div(children=[
   dcc.Upload(children="Drop file here", id="upload"),
   html.Div(children="", id="upload-info"),
   html.Button(children="Click to read next from file", id="action"),
   html.Div(children="", id="action-info"),
])


@app.callback(
   Output("upload-info", "children"),
   [Input("upload", "filename")],
   [State("upload", "contents")],
)
def upload_action(fn, contents):
   if fn is None:
       raise PreventUpdate
   # Save file to disk (on server).
   content_type, content_string = contents.split(',')
   with open(os.path.join(folder_on_server, fn), 'wb') as f:
       f.write(base64.b64decode(content_string))
   # Update label.
   return "Latest upload was {}".format(fn)


@app.callback(
   Output("action-info", "children"),
   [Input("action", "n_clicks")],
   [State("upload", "filename")],
)
def button_action(n_clicks, fn):
   if fn is None:
       raise PreventUpdate
   # Read file from disk (on server).
   with open(os.path.join(folder_on_server, fn), 'r') as f:
       content = f.readlines()
   # Update label.
   return content[n_clicks % len(content)]


if __name__ == '__main__':
   app.run_server(debug=True)

2 Likes

Hey @Emil! Just update my code based on your sugestion and It works. The upload is really faster. Thank you for your reply and gelo. Greetings!