The Store
component in Dash makes it easy to share state between callbacks. Under the hood, the data are stored as JSON in the browser. This approach is chosen to keep the server stateless (i guess), but it has a few drawbacks
- As the data are stored in JSON, you must convert objects from/to JSON in the beginning/end of each callback
- Since the callbacks are executed server side while the data are stored client side, the data will be sent across the wire every time a callback is invoked
- The maximum storage size is limited by the browser (more than ~ 10 MB will probably cause trouble)
For small amounts of data, none of these issues are significant. For large amounts of data, they can be deal breakers in terms of application performance. The solution to the problem is (yes, you guessed it) server side caching. While it’s already documented, i have always felt that the syntax was more complicated than it needs to be.
The CallbackCache
is an attempt to address this challenge. To enable it’s magic, callbacks must be registered on this object rather than the Dash
app itself. In the end (before run_server
), the object itself is registered on the app,
import dash
from dash_extensions.callback import CallbackCache
app = dash.Dash()
cc = CallbackCache() # create callback cache
...
@cc.callback(...) # register callback on cc instead of app
...
cc.register(app) # this call registers the callbacks on the application
if __name__ == '__main__':
app.run_server()
In addition to the normal callback
decorator, it has a special cached_callback
decorator, which saves the data in a server side cache. The cache takes care of serialization (typically via pickle
), so you don’t need to convert from/to JSON in the beginning/end of each callback. Hence, you could do this
@cc.cached_callback(Output("store", "data"), [Trigger("btn", "n_clicks")]) # Trigger is like Input, but excluded from args
def query_data():
time.sleep(1) # sleep to emulate a database call / a long calculation
return px.data.gapminder() # no conversion, just return the data frame
@cc.callback(Output("dd", "options"), [Input("store", "data")])
def update_dd(df):
return [{"label": column, "value": column} for column in df["year"]] # no conversion, just use the data frame
And since the cache is server side, there is no data transfer (apart from the cache key, which is a short string). The maximum storage size is limited only by the underlying cache, i.e. it can be on the order of GBs depending on the server hardware. Enough talk, here is a small self-contained example,
import dash
import dash_core_components as dcc
import dash_html_components as html
import time
import plotly.express as px
from dash.dependencies import Output, Input
from flask_caching.backends import FileSystemCache
from dash_extensions.callback import CallbackCache, Trigger
# Create app.
app = dash.Dash(prevent_initial_callbacks=True)
app.layout = html.Div([
html.Button("Query data", id="btn"), dcc.Dropdown(id="dd"), dcc.Graph(id="graph"),
dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot")
])
# Create (server side) cache. Works with any flask caching backend.
cc = CallbackCache(cache=FileSystemCache(cache_dir="cache"))
@cc.cached_callback(Output("store", "data"), [Trigger("btn", "n_clicks")]) # Trigger is like Input, but excluded from args
def query_data():
time.sleep(1) # sleep to emulate a database call / a long calculation
return px.data.gapminder()
@cc.callback(Output("dd", "options"), [Input("store", "data")])
def update_dd(df):
return [{"label": column, "value": column} for column in df["year"]]
@cc.callback(Output("graph", "figure"), [Input("store", "data"), Input("dd", "value")])
def update_graph(df, value):
df = df.query("year == {}".format(value))
return px.sunburst(df, path=['continent', 'country'], values='pop', color='lifeExp', hover_data=['iso_alpha'])
# This call registers the callbacks on the application.
cc.register(app)
if __name__ == '__main__':
app.run_server()
To run the example, you’ll need the latest version of dash extensions,
pip install dash-extensions==0.0.23
The server cache (passed via the cache
argument) can be any flask_caching
backend, so there are lot’s of options to choose from. For most users, i guess the (default) FileSystemCache
will do. If you would like to reuse the cached result when the inputs are unchanged, you can pass instant_refresh=False
. If you would furthermore like to reuse cached results between sessions, pass session_check=False
also.
If you have any questions and/or suggestions to improve the syntax, please let me know
EDIT: I have now done some simple benchmarks. Assuming that your application is bottlenecked by serialization and/or data transfer of pandas data frames, the cached_callback
tends to yield a performance improvement in the range to 10-100 times.