Yeah, this is a limitation in Dash right now. In Dash, all of the data (the store) is stored in the web browser. This data is accessed whenever any of the input elements change and then sent to the server.
In this case, ~150MB ends up being a lot for the web browser to handle. It’s hard to say why the browser is crashing / freezing. It could be one of many things:
1 - The process of “uploading” the data and converting it into memory on the client is CPU and causes things to freeze
2 - The process of making a 150MB HTTP request is perhaps CPU intensive and causes things to freeze
3 - Converting the file into memory on the client (the browser) takes up a lot more memory than 150MB and causes the machine to run out of memory
4 - Converting the file to memory on the Dash server (in a dev environment the client and server will be on the same machine) causes the machine to run out of memory
We could solve 2 and 4 by supporting streamed requests. In flask, this would look like Streaming input and output in Flask. This would require some updates in dash-renderer
, which is dash’s JS front-end.
If the issue is 3, then we need to work on memory management in Dash’s front-end. I’m sure there is lots of low hanging fruit here. However, due to Dash’s architecture, we need to keep the file contents around in memory in the client as they need to be accessible whenever a Dash callback might need them. This is unlike, say, Google Drive where uploading a file just passes it through the browser from your computer to their servers - it doesn’t persist on your browser.
So, this brings us to a question about what the underlying use case is. Perhaps “uploading a very large file” is slightly outside of the reactive dash paradigm. The Dash upload component could simply stream the data to the Dash Flask server (without keeping it in memory) and the Dash developer could refer to that data on the disk by its filename and the user’s session if they needed to access it.
In psuedocode:
def serve_layout():
session_id = rand_id()
html.Div([
html.Div(session_id, id='session-id', style={'display': 'none'}),
dcc.Dropdown(id='dropdown'),
dcc.Upload(endpoint='/streaming-upload', session=session_id)
])
# save the file, bypass Dash
@flask.route("", methods=["POST"])
def save_streaming_upload():
user_session = request.params['session']
with open(user_session, "bw") as f:
chunk_size = 4096
while True:
chunk = flask.request.stream.read(chunk_size)
if len(chunk) == 0:
return
f.write(chunk)
@app.callback(Output('...'), [Input('dropdown', 'value'), Input('session-id', 'children')])
def filter_data(value, session):
df = pd.read_csv(session)
...
In this case, we’re identifying the user’s particular upload by setting some type of session ID, perhaps as part of the dcc.Upload
component. The dcc.Upload
component would be responsible for making a streaming request to the endpoint
parameter.
Perhaps something like this would work.
In any case, I would be curious to hear more about your use case @sophotrope. What would you like to do with these huge files? Are you just saving them? Or are you creating dynamic UIs based off of them? How do you expect your app work with multiple, concurrent users?