Black Lives Matter. Please consider donating to Black Girls Code today.
Dash HoloViews is now available! Check out the docs.

App displaying slices of large dataset (best practice)

Dear all,

I have been playing around with ideas from https://dash.plot.ly/sharing-data-between-callbacks for a while now.

I have an application made up of ~10 charts which depend on one dropdown. The data is essentially one very large pandas dataframe. All the charts are just slices or other relatively fast computations of the same dataset in different dimensions. The computation cost is rather low. The df gets updated with new data over time and is persisted in an hdf5 store for fast reading by another ‘data update process’.

The data from the persisted hdf5 file is pushed to the application on a regular basis (every few hours) using the page reload layout and an intervalupdater along the lines of https://dash.plot.ly/live-updates.

I see now two ways to organize the data flows:

  1. Have only one worker process where we update the df globally. I understand that this is against the fundamental principles of dash. But as long as I do not run the app with multiple workers and all users should have a view on the same (perhaps updated) data, is this really an issue?

  2. Pull the up to date version of the df from a global store on each graph callback. I am using a redis server to store the data in memory to avoid having to read from hdf5 at each callback. However it seems to me the cost of loading the dataset each time outweighs by far the performance gain of having several worker nodes.

Are there other problems with 1) that I do not see right now? Are there maybe other approaches that you would recommend?

Thanks!

2 Likes