Updating a graph based on a dropdown and intermediate storage is too slow

noideaman · October 19, 2018, 5:27pm

I am using a hidden div to store computationally expensive intermediate data that is calculated when a new file is uploaded. That seems to work OK. My problem is that I need to update graphs based on drop-down values that indicate subsets of my Pandas dataframe. When I choose a dropdown value, the entire dataset has to be loaded, subsetted, and then graph updated. I have two problems with this setup:

From the time I choose a value in the drop-down, to the time the update_graph function runs, is on the order of 5-10 seconds (i.e. I don’t see the Updating graph! statement in my console until 5-10 seconds after I choose a drop-down value.
It takes a LONG time to do json.loads(dataset) and then subset the data.

There must be a better way to do this? This is how it is setup now:

@app.callback(Output('My Graph', 'figure'), 
              [Input('Dropdown', 'value')],
              [State('intermediate-data', 'children')])
def update_graph(dropdown, datasets):
    print("Updating graph!")
    if datasets is not None and dropdown is not None:
        datasets = json.loads(datasets['dataA'])
        data = pd.read_json(datasets['dataA'], orient='records')
        data = data[data['subset_column'].isin(dropdown)]
        graph, f = update_graph_function(data)
        return f
    else:
        raise PreventUpdate

Thanks everyone!

rnarren1 · October 19, 2018, 6:08pm

How large are the intermediate data?

Every time you click the dropdown, the following happens ALL the intermediate data is serialized to json, sent over the network, and then de-serialized. If there is a lot of data, this process would be very slow.

The hidden div is only one way to save expensive to compute data, I think it would make sense for you to store data on the server if they are large enough to have these long wait times. There are some examples of how to do this here: https://dash.plot.ly/sharing-data-between-callbacks, specifically Examples 3 and 4.

noideaman · October 19, 2018, 6:18pm

It’s in the range of 1-2 GBs… so fairly large.

Are you suggesting breaking up each possible drop-down subset into a separate intermediate and then using that?

EDIT: I’m going to try the caching method on the filesystem now.

rnarren1 · October 19, 2018, 6:37pm

With that amount of data I would definitely recommend using some solution to store the data on or near the Flask server rather than in the user’s browser. Depending on the use case, you can even just store the file to disk.

Topic		Replies	Views
Combine dropdown and live update to graph Dash Python	4	1033	September 12, 2019
Javascript Callbacks, initial data Dash Python	2	1736	October 9, 2019
Dash - MATCH and performance - Data Handling Dash Python question	5	362	November 8, 2023
Fast way to share data between callbacks Dash Python	3	9754	February 15, 2018
Update options in the dropdown list Dash Python question	12	447	June 27, 2023

Updating a graph based on a dropdown and intermediate storage is too slow

Related Topics