📣 Dash Labs 0.4.0: @long_callback caching and Windows support

Hi All,

We just pushed a new version (0.4.0) of Dash Labs that includes several enhancements top the @long_callback support introduced in version 0.3.0 (See 📣 Dash Labs 0.3.0: @app.long_callback support). These enhancements include:

  1. Windows support
  2. Support for caching long_callback results with a flexible system for configuring how cache hits are calculated.
  3. Support for updating arbitrary components while the @long_callback is running using the set_progress function. As the example below demonstrates, this makes it possible to do things like update a graph while a long_callback is executing.

Changes:

To facilitate Windows support, we’ve replaced the FlaskCaching backend with a backend based on the diskcache library. The celery backend remains unchanged.

Additionally, on Windows the multiprocess library is required as well.

Here are some excerpts of the updated documentation.


Enabling long-callback support

In Dash Labs, the @long_callback decorator is enabled using the LongCallback plugin. To support multiple backends, the LongCallback plugin is, itself, configured with either a DiskcacheCachingCallbackManager or CeleryCallbackManager object. Furthermore, in addition to the LongCallback plugin, the FlexibleCallback and HiddenComponents plugins must be enabled as well. Here is an example of configuring an app to enable the @long_callback decorator using the diskcache backend.

import dash
import dash_labs as dl

## Diskcache
import diskcache
cache = diskcache.Cache("./cache")
long_callback_manager = dl.plugins.DiskcacheCachingCallbackManager(cache)

app = dash.Dash(__name__, plugins=[
    dl.plugins.FlexibleCallbacks(),
    dl.plugins.HiddenComponents(),
    dl.plugins.LongCallback(long_callback_manager)
])

This configuration requires the diskcache package which can be installed with:

$ pip install diskcache

Additionally, on Windows the multiprocess library is required as well.

$ pip install multiprocess

<snip>

Example 5: Progress bar chart graph

The progress argument to the @long_callback decorator can be used to update arbitrary component properties. This example creates and updates a plotly bar graph to display the current calculation status. This example also uses the progress_default argument to long_callback to specify a grouping of values that should be assigned to the components specified by the progress argument when the callback is not in progress. If progress_default is not provided, all the dependency properties specified in progress will be set to None when the callback is not running. In this case, progress_default is set to a figure with a zero width bar.

import time
import dash
import dash_html_components as html
import dash_core_components as dcc
import dash_labs as dl
from dash_labs.plugins import DiskcacheCachingCallbackManager
import plotly.graph_objects as go

## Diskcache
import diskcache
cache = diskcache.Cache("./cache")
long_callback_manager = DiskcacheCachingCallbackManager(cache)

def make_progress_graph(progress, total):
    progress_graph = (
        go.Figure(data=[go.Bar(x=[progress])])
        .update_xaxes(range=[0, total])
        .update_yaxes(
            showticklabels=False,
        )
        .update_layout(height=100, margin=dict(t=20, b=40))
    )
    return progress_graph

app = dash.Dash(
    __name__,
    plugins=[
        dl.plugins.FlexibleCallbacks(),
        dl.plugins.HiddenComponents(),
        dl.plugins.LongCallback(long_callback_manager),
    ],
)

app.layout = html.Div(
    [
        html.Div(
            [
                html.P(id="paragraph_id", children=["Button not clicked"]),
                dcc.Graph(id="progress_bar_graph", figure=make_progress_graph(0, 10)),
            ]
        ),
        html.Button(id="button_id", children="Run Job!"),
        html.Button(id="cancel_button_id", children="Cancel Running Job!"),
    ]
)

@app.long_callback(
    output=dl.Output("paragraph_id", "children"),
    args=dl.Input("button_id", "n_clicks"),
    running=[
        (dl.Output("button_id", "disabled"), True, False),
        (dl.Output("cancel_button_id", "disabled"), False, True),
        (
            dl.Output("paragraph_id", "style"),
            {"visibility": "hidden"},
            {"visibility": "visible"},
        ),
        (
            dl.Output("progress_bar_graph", "style"),
            {"visibility": "visible"},
            {"visibility": "hidden"},
        ),
    ],
    cancel=[dl.Input("cancel_button_id", "n_clicks")],
    progress=dl.Output("progress_bar_graph", "figure"),
    progress_default=make_progress_graph(0, 10),
    interval=1000,
)
def callback(set_progress, n_clicks):
    total = 10
    for i in range(total):
        time.sleep(0.5)
        set_progress(make_progress_graph(i, 10))

    return [f"Clicked {n_clicks} times"]


if __name__ == "__main__":
    app.run_server(debug=True)

Caching results with long_callback

The long_callback decorator can optionally memoize callback function results through caching, and it provides a flexible API for configuring when cached results may be reused.

Note: The current caching configuration API is fairly low-level, and in the future we expect that it will be useful to provide several preconfigured caching profiles.

How it works

Here is a high-level description of how caching works in long_callback. Conceptually, you can imagine a dictionary is associated with each decorated callback function. Each time the decorated function is called, the input arguments to the function (and potentially other information about the environment) are hashed to generate a key. The long_callback decorator then checks the dictionary to see if there is already a value stored using this key. If so, the decorated function is not called, and the cached result is returned. If not, the function is called and the result is stored in the dictionary using the associated key.

The built-in functools.lru_cache decorator uses a Python dict just like this. The situation is slightly more complicated with Dash for two reasons:

  1. We might want the cache to persist across server restarts.
  2. When an app is served using multiple processes (e.g. multiple gunicorn workers on a single server, or multiple servers behind a load balancer), we might want to shared cached values across all of these processes.

For these reasons, a simple Python dict is not a suitable storage container for caching Dash callbacks. Instead, long_callback uses the current diskcache or Celery callback manager to store cached results.

Caching flexibility requirements

To support caching in a variety of development and production use cases, long_callback may be configured by one or more zero-argument functions, where the return values of these functions are combined with the function input arguments when generating the cache key. Several common use-cases will be described below.

Enabling caching

Caching is enabled by providing one or more zero-argument functions to the cache_by argument of long_callback. These functions are called each time the status of a long_callback function is checked, and their return values are hashed as part of the cache key.

Here is an example using the diskcache callback manager. The clear_cache argument controls whether the cache is reset at startup. In this example, the cache_by argument is set to a lambda function that returns a fixed UUID that is randomly generated during app initialization. The implication of this cache_by function is that the cache is shared across all invocations of the callback across all user sessions that are handled by a single server instance. Each time a server process is restarted, the cache is cleared an a new UUID is generated.

import time
from uuid import uuid4
import dash
import dash_html_components as html
import dash_labs as dl

## Diskcache
import diskcache
launch_uid = uuid4()
cache = diskcache.Cache("./cache")
long_callback_manager = dl.plugins.DiskcacheCachingCallbackManager(
    cache, cache_by=[lambda: launch_uid], expire=60,
)

app = dash.Dash(
    __name__,
    plugins=[
        dl.plugins.FlexibleCallbacks(),
        dl.plugins.HiddenComponents(),
        dl.plugins.LongCallback(long_callback_manager),
    ],
)

app.layout = html.Div(
    [
        html.Div([html.P(id="paragraph_id", children=["Button not clicked"])]),
        html.Button(id="button_id", children="Run Job!"),
        html.Button(id="cancel_button_id", children="Cancel Running Job!"),
    ]
)


@app.long_callback(
    output=(dl.Output("paragraph_id", "children"), dl.Output("button_id", "n_clicks")),
    args=dl.Input("button_id", "n_clicks"),
    running=[
        (dl.Output("button_id", "disabled"), True, False),
        (dl.Output("cancel_button_id", "disabled"), False, True),
    ],
    cancel=[dl.Input("cancel_button_id", "n_clicks")],
)
def callback(n_clicks):
    time.sleep(2.0)
    return [f"Clicked {n_clicks} times"], (n_clicks or 0) % 4


if __name__ == "__main__":
    app.run_server(debug=True)

Here you can see that it takes a few seconds to run the callback function, but the cached results are used after n_clicks cycles back around to 0. By interacting with the app in a separate tab, you can see that the cache results are shared across user sessions.

cache_by function workflows

Various cache_by functions can be used to accomplish a variety of caching policies. Here are a few examples:

  • A cache_by function could return the file modification time of a dataset to automatically invalidate the cache when an input dataset changes.
  • In a Heroku or Dash Enterprise deployment setting, a cache_by function could return the git hash of the app, making it possible to persist the cache across redeploys, but invalidate it when the app’s source changes.
  • In a Dash Enterprise setting, the cache_by function could return user meta-data to prevent cached values from being shared across users.
5 Likes

I tried both the examples mentioned in the documentation in Long Callbacks | Dash for Python Documentation | Plotly.

None of the two (DiskCache and Celery) seems to working on my system. It shows “Button not clicked” even after clicking number of times. Attached image:

I am using redis on docker container for Celery.

Have you managed to get this to work since? I still can’t get it to work with redis and celery