I don’t understand what the advantage of dcc.Store is. In my working mental model, the server (app.py) loads a dataframe into memory, and then when the user clicks something, some python function creates a plotly plot, which is serialized on the server and sent to the client. If that’s the case, what is the purpose of sending the dataset to the client as json? I can only imagine this being useful if you use clientside callbacks. Is there any use to dcc.Store without clientside callbacks? Do some of the callbacks I’m writing in python run on the client?
The purpose of the
Store component is to enable storing state on the client. It could be anything really; a session id, user identifiers, some data that are expensive (i.e. slow) to query, or …
But how do the python functions retrieve that state on the client? I imagine the python functions as executing on the server.
Yes. A request is sent from the client to the server containing the data.
(via callbacks! that’s what the callback does. inputs define which data to pull from the components in the browser, outputs define what data is sent from python function back up to client)
Okay so I’m loading a dataframe into memory on the server. Then, I use a dcc.Store to place that data (as json) on the client. Then I retrieve that data from the client back to the server to generate a figure in a callback, and send the figure back? That doesn’t make sense to me. Why would you ever store a dataframe on the client, if the figure is generated from the data on the server?
A key design principle of Dash is to keep the server stateless. Unless the data that you are loading is static (in which case you could just load it into a global variable on app initialization), you would thus have to store the data on the client (as it represents state).
It is possible to introduce state on the server, e.g. using a
ServersideOutput, but depending on your application, it might complicate your infrastructure configuration and/or scaling.
Maybe it’ll help for me to share a little about my use case.
- When the server starts, it loads a large matrix (dataframe) into ram.
- When the user navigates to the page, they see a heatmap of that matrix.
- The user can then select groups of columns from dropdowns to investigate. Right now, the callback on dropdown change goes back to the server to subset the dataframe and return a new heatmap.
I’d like to find a way to subset the matrix, and re-render the heatmap on the client, rather than doing a roundtrip to the server and sending back a new matrix.
For that kind of use case, I think the best approach would be to store the large matrix in a global variable, which is then accessed in callbacks to generate the heatmaps. This way, you would only need to send the data embedded in the heatmap in each request.
In principle, you could also transmit the date to the client once, put it in a
Personally, I use dcc.Store for small meta type things that need to be passed around between callbacks, e.g. the resulting meta data for a set of user selections. It is not good for storing large sets of data for a couple reasons. First, it is recommended that the amount of memory for a dcc.Store be kept under 15 MB. I have tested Chrome and it easily handles 100 MB but completely freezes around 150 MB for a single Store. Second, it does not make much sense to try and store large DataFrames since the conversion back and forth between DataFrames and json is slower than just retrieving the data from the backend sql database. I have not done a formal bench test, but pulling a 50 MB json string from the Store, converting it to a DataFrame, adding 50 MB from a subsequent sql call to the DataFrame, and saving it back to the Store as json took around 3.5 seconds, which is a long time for the app to be unresponsive. Loading the old and new data from a new sql call takes 0.3 seconds.
For my application I have data that is separated by “session”. If the user is exploring/plotting data from one session and then wants to add data from five other sessions, it is cheaper to load those six sessions directly from sql rather than read the first session data from the Store, convert it to a DataFrame, read the new five sessions from sql, concat the new data to the DataFrame, convert to json, and pass that back through the Output to the Store. Instead, I will just save a list of the selected session_id’s to a Store and pass that to any callbacks. Granted, this requires some organization of the data on the sql side.
In my opinion, the dcc.Store is good to use for applications that are simple enough not to require a database, and when the largest json string would be less than the recommended 15 MB.