Black Lives Matter. Please consider donating to Black Girls Code today.

Updating a graph based on a dropdown and intermediate storage is too slow

I am using a hidden div to store computationally expensive intermediate data that is calculated when a new file is uploaded. That seems to work OK. My problem is that I need to update graphs based on drop-down values that indicate subsets of my Pandas dataframe. When I choose a dropdown value, the entire dataset has to be loaded, subsetted, and then graph updated. I have two problems with this setup:

  1. From the time I choose a value in the drop-down, to the time the update_graph function runs, is on the order of 5-10 seconds (i.e. I don’t see the Updating graph! statement in my console until 5-10 seconds after I choose a drop-down value.
  2. It takes a LONG time to do json.loads(dataset) and then subset the data.

There must be a better way to do this? This is how it is setup now:

@app.callback(Output('My Graph', 'figure'), 
              [Input('Dropdown', 'value')],
              [State('intermediate-data', 'children')])
def update_graph(dropdown, datasets):
    print("Updating graph!")
    if datasets is not None and dropdown is not None:
        datasets = json.loads(datasets['dataA'])
        data = pd.read_json(datasets['dataA'], orient='records')
        data = data[data['subset_column'].isin(dropdown)]
        graph, f = update_graph_function(data)
        return f
    else:
        raise PreventUpdate

Thanks everyone!

How large are the intermediate data?

Every time you click the dropdown, the following happens ALL the intermediate data is serialized to json, sent over the network, and then de-serialized. If there is a lot of data, this process would be very slow.

The hidden div is only one way to save expensive to compute data, I think it would make sense for you to store data on the server if they are large enough to have these long wait times. There are some examples of how to do this here: https://dash.plot.ly/sharing-data-between-callbacks, specifically Examples 3 and 4.

It’s in the range of 1-2 GBs… so fairly large.

Are you suggesting breaking up each possible drop-down subset into a separate intermediate and then using that?

EDIT: I’m going to try the caching method on the filesystem now.

With that amount of data I would definitely recommend using some solution to store the data on or near the Flask server rather than in the user’s browser. Depending on the use case, you can even just store the file to disk.