Slow Load Time on Dataset

Hey All!

I recently published a Dash App (https://showstats.herokuapp.com/) I’ve been working on for a while. Most everything is going well, but the loading page (multi-page app) takes roughly 6-7 seconds to load every time it’s selected. The data set is a Parquet file (read in from S3) that is 2.3 MB in parquet form and roughly 13.8 MB for the entire df when read into pandas.

I have 5 callbacks that all use the same dataset each time the callback is activate (5 different dropdowns). I’m assuming this is the issue, but I’ve had a difficult time fixing it. I’ve tried using dcc.store, which unsurprisingly did not work. Additionally, I’ve tried ServerSide caching (ShowStats/showstats_app.py at main · bkmurph/ShowStats · GitHub), but the load time is still quite slow and operations take 1-2 seconds once the page is loaded.

Finally, I’ve tried reading in the data globally within a script, then passing that data to each individual callback. This is also slow to load, but operations are immediate. It feels like the clunkiest solution, but it does work best right now.
add persistence · bkmurph/ShowStats@0e27cb0 · GitHub

Can anyone provide guidance on the best way to handle this slowness? I feel that there has to be a solution since the dataset really isn’t very large. I’m just not quite grasping it.

P.S - The data load is slow on my computer’s local server as well, so I don’t think it’s a Heroku problem or anything.

Hello @clevercolt,

You should try using Patch() instead of updating all of the figures info. This should bump up your rendering of the figures for the callbacks.

Just send the empty figures without data to the dcc.Graphs and then use the Patch to update based upon your data.

It also seems like you will be looping everything twice, when you just need to have the figures respond to the data being updated:

This shouldnt have the input from the store-data, this would cause triggering twice.

So, to summarize, you should:
remove the store-data input from your store
remove the submit button inputs from your other areas and go on the input from the store
change the figures to update using Patch() instead of passing the whole figure again

3 Likes

Is the data static? If so, I would consider reading it in on load as a global variable as a fine solution. The initial load will be slow, yes, but all subsequent operations will be fast.

2 Likes

@jinnyzor Thanks for the reply! I removed the store-data input and the submit button inputs from all of the callbacks. Thanks for helping me clean that up!

As far as using Patch to update the dataset, I’m having a slightly harder time following. I think what you are suggesting is to update the data being passed to each figure using Patch, however I’m a little confused on how to implement this. Are you suggesting to use the dropdowns (dead_drop, billy_drop, etc.) as an input (or State) for each of the individual callbacks? Then from there use something like…

my_uuids = dead_uuids + phish_uuids + goose_uuids + billy_uuids + panic_uuids
patched_fig = Patch()
patched_fig["data"] = df['uuid'].isin(my_uuids)
return patched_fig

The above is just an illustrative example, but should hopefully demonstrate how I’m interpreting your suggestion.

Hey @Emil - The dataset is somewhat static, but different rows are selected based on the inputs from 5 different dropdown lists that the user can interact with. This interaction is decently speedy when using the app, but loading the main page (and navigating back to the main page from other pages) is quite slow which is bothersome to me

In that case, I believe the most efficient would to,

  1. Load all the complete (static) dataset into memory as a global variable df (i.e. the data will be read only once, on app load)
  2. Create a function that maps (filters) df based on drop down selections into the selected data df_selected
  3. For each callback that needsdf_selected, add the selections as State and construct df_selected from within the callback

If the operation of transforming df into df_selected is very expensive (which would typically not be the case), you would benefit by combining multiple callbacks that need df_selected into one where possible.

Thanks for your help @Emil !!

I loaded the dataset as a global var, created a function to filter the global df (takes the drop down selections as input), then passed the 5 drop downs to each of the callbacks as “State”. This makes it so that the operations on the main page are relatively instant once I press the submit button. Thanks for helping me with this!

As far as the slow loading time… is there anything I can do to avoid this? It’s not the worst thing in the world, but if possible I’d prefer to not make people wait 5-6 seconds for the main page to load every time they navigate to it.

If you aren’t expecting the variable to change much, you could always load it with the app, outside of the layout.

This will not effect the user with the load time, but just keep the df in memory.