Thanks for all the attention here. It’s very helpful.
My use case: I’m doing something which may not be ideal for dash… I’m making a small application which ingests a dataset from the user and then performs some fancy machine learning tasks to analyze user’s submitted data. The program populates a number of interactive plots. These plots also have parameters that can change and require updating. There is a data flow hierarchy here so the reactivity of the whole thing is a nice paradigm. And because my computational backend is python it seemed natural to try to use dash.
In the future this program would likely be useful if deployed on a server on the cloud. In that case the user would be uploading his data to some persistent filesystem. However, I do not foresee a usecase in which several users are being served the same content concurrently. Even on the cloud it would be an istance-per-user type application.
For an MVP all I would really want is the ability for the user to go through a kind of ‘file selection’ button in order to search their local filesystem for a datafile and for that file (address or handle) be passed off to the backend for processing – not copied and encoded on the server. I’m not very knowledgeable about this stuff, but I think I’ve read that HTML5 exposes this kind of capability in the file API, but I think it hasn’t been translated through to the dash API. Correct me if I’m wrong. Thanks in advance.
Note that if the user is also running the app, then you can just display a dropdown with a list of the user’s files that they could select. You could just populate this dropdown’s options by listing files using glob.glob or something like that. psuedocode:
app.layout = html.Div([
dcc.Dropdown(id='file-list', options=[{'label': i, 'value': i} for i in glob.glob('*.csv')])
])
This won’t work in a client-server pattern (where the app is running on a different machine than the user who is viewing it) but it would work fine for development.
To make @chriddyp’s idea for an MVP support a client server setup, you could also embed an iframe within your dash app that points to a page with a super simple vanilla html upload form, thereby escaping the Dash app just for the upload. Then you’d need a ‘refresh’ button or some such which triggers a callback that updates the dropdown with the values of the filesystem glob now that the file has been uploaded.
Although now you’re just bypassing the Upload component entirely, so you obviously lose all the nice features it supports.
Did you get any success in building the intended app @sophotrope? I am trying to build something similar, and it would be great to know if your application was successfully built? If yes, then could you share all key learning from your project?
A new issue for this solution is how to make the flask part interactive with Dash components. For example, how to show the name of the uploaded file in ‘output’ block. I can do this by monitoring the filenames in the upload folder, but there should be some more straightforward way.
I’m currently writing my masters’ thesis in the field of Topological Data Analysis in combination with the classification of observations. Comprised is the implementation of the interactive analytics process as a dashboard web application.
The user should be allowed to upload a csv file containing the required data. As the whole process is implemented in Python, I have chosen Dash to implement the user interface. I use three datasets in my thesis, consisting 1k resp. 280k and 800k observations. The size of the two large files is around 150MB resp. 210MB.
Problems occur regarding the upload of the large files to show the first 10 records for feature selection. To make the dataset usable for the application, I use read_csv of the pandas module to read the dataset into a pandas dataframe. This dataframe is serialized as a JSON string and saved in a hidden Div. The JSON serialization of the dataframe is stored in the browsers’ cache. Uploading one of the larger files, an Out of Memory Error in my browser is occurring and the app doesn’t change. This behavior seems to be independent from the web browser used.
The first approach to solve my problem was to replace the upload component from the module dash-core-components by the upload component of the module dash-resumable-upload, which didn’t solve the problem. The out of memory error isn’t occurring but the website is not updated with a preview of the first ten records. The next approach was to use a flask cache on the filesystem but the problem persists and the browser shows an Out of Memory Error. The last approach to get rid of my problem was to replace the hidden Div by a Store component of the dash-core-components module, which didn’t solve my problem, too. If I use just a subset of 50k observations everything works fine.
What could I do that Dash will work with a larger amount of data? Thank you very much in advance for your responses.
Has the Problem of large dataset handling being Solved by Now ?.. We have used sql for data handling and its working fine, but when it comes to larger dataset greater than 50 MB same issues are faced. The web browser crashes. Could you update on this please ?
I have did the same think,
but the data I am taking is from local SQL database server and not from cloud.
Like @sophotrope said We need to use cloud instead of local SQL server.
will it be possible to do it?
So utilizing flask to save the file…does that change the type of file that is being uploaded? So will ‘Sample.xlsx’ still be ‘Sample.xlsx’? And how do you call for use later?
I am also wondering if there has been any update on uploading large files with Dash without them being fully in memory in both the browser and server. I did see some of the other threads about dash-resumable-upload, but it appears that is no longer being maintained.
import dash
import dash_html_components as html
import dash_uploader as du
app = dash.Dash(__name__)
# 1) configure the upload folder
du.configure_upload(app, r"C:\tmp\Uploads")
# 2) Use the Upload component
app.layout = html.Div([
du.Upload(),
])
if __name__ == '__main__':
app.run_server(debug=True)
@maulberto3 no, the dash-uploader does not upload the data into the browser at all, but uses HTTP POST requests to send the file to the webserver. This way you can send file of even several Gigabytes without worrying that your browser would get out of memory.
@maulberto3 The uploaded files are saved to the hard disk of the computer that is running your dash application. So, you may decide yourself who can access the files through your app