Announcing Dash Bio 1.0.0 🎉 : a one-stop-shop for bioinformatics and drug development visualizations.

Bypassing serialization of Dash graph objects for efficient server-side caching

I’m developing an application in which lots of graphs need to be rendered on every page. Some of these graphs are quite complex, and the callbacks that output them take 50-500ms on my high-end local development machine, even more so on the production server which is a carefully sized VM.

Server-side caching the graphs (via Redis or diskcache, for instance) is an excellent option in my case, but there is still an unnecessary cost that comes from JSON-loading the serialized graph object (via plotly.io.from_json) just to have it serialized it again by the callback that produces it (I bet that the decorated callback uses plotly.io.to_json to do that, internally). The code below is an example of how this could be done (nevermind the session-uuid store, it’s just a generic placeholder for inputs of any sort):

import plotly.io
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
from dash.exceptions import PreventUpdate
from redis import Redis

redis = Redis(host="redis", port=6379)

app = Dash(__name__)
app.layout = html.Div([
    dcc.Store("session-uuid", storage_type="session"),
    dcc.Graph("costly-graph")
])


@app.callback(
    Output("costly-graph", "figure"),
    Input("session-uuid", "data")
)
def update_graph(session_uuid: str):
    if session_uuid is None:
        raise PreventUpdate()
    
    figure_key = f"some-key-depending-on-{session_uuid}-or-any-other-input-for-instance"
    figure_bytes = redis.get(figure_key)

    if figure_bytes:
        figure = plotly.io.from_json(figure_bytes.decode())
    else:
        figure = produce_costly_figure(session_uuid)
        redis.set(figure_key, plotly.io.to_json(figure).encode())

    return figure


def produce_costly_figure(session_uuid):
    """Some function that takes some 500+ms to complete."""
    pass

Therefore, I wonder if there’s a way (or any plans/quick hacks) for “bypassing” the serialization, so that I can return a JSON string (produced/retrieved in whatever way) as the content of the output graph object. Following the example above, something along the lines of (note the hypothetical raw_json argument to Input):

@app.callback(
    Output("costly-graph", "figure", raw_json=True),
    Input("session-uuid", "data")
)
def update_graph(session_uuid: str):
    if session_uuid is None:
        raise PreventUpdate()

    figure_key = f"some-key-depending-on-{session_uuid}-or-any-other-input-for-instance"
    figure_bytes = redis.get(figure_key)

    if not figure_bytes:
        figure = produce_costly_figure(session_uuid)
        figure_bytes = plotly.io.to_json(figure).encode()
        redis.set(figure_key, figure_bytes)

    return figure_bytes

By the way: I know about orjson and it does improve performance by a small margin, but I’m really concerned with squeezing every drop of juice by preventing a useless round trip!

Thanks in advance!

Hi @pedro.asad,

I haven’t experimented long enough with server-side caching and not at all with the serialization part, but this is a nice question and I would like to give my two cents while we wait for better answers…

One thing that I find extremely performant is to send the chart data to the clientside via a Store component and then create the figure object as a js object in a clientside function. This could be an alternative for you if your charts are all dependent on the same dataset or on small datasets. In my recent case I did it so I could cache the whole data on the clientside and use sliders and filters to quickly update it without going back and forth with the data on the server… Needless to say, this is not a great approach for large datasets and I am now trying to wrap my head around combining this approach with some serverside caching.

Let’s hope you’ll find some other ideas more helpful than this one!

1 Like

Thanks for the reply @jlfsjunior, you somehow got me almost there! The one thing I still don’t know is how to turn the synchronous XmlHttpRequest in the example below into an asynchronous one, inside the client-side callback.

The full story: as a matter of fact, the docs page on client-side callback shows, precisely, an example of returning raw JSON that is temporarily stored in a dcc.Store before updating the figure with a client-side callback, just like you suggested. I’m really on the large dataset case, so that would not be acceptable, but there’s a simple workaround: create a custom Flask endpoint for serving the JSON string and call it from a client-side callback! I modified my previous example to illustrate how to do it (remember to navigate to http://127.0.0.1:8050 in your browser, otherwise you’ll get a CORS error):

import http

import numpy as np
import plotly.io
import plotly.express as px
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
from flask import make_response
from plotly.graph_objs import Figure
from redis import Redis

# docker run -p 6379:6379 redis
redis = Redis(host="0.0.0.0", port=6378)

app = Dash(__name__)
app.layout = html.Div([
    dcc.Input("points", type="number"),
    dcc.Graph("costly-graph")
])


app.clientside_callback(
    """
    function update(n_points) {
        const req = new XMLHttpRequest();
        req.open("GET", `http://127.0.0.1:8050/data/${n_points}`, false);
        req.send(null);
        
        if (req.status === 200) {
            return JSON.parse(req.responseText);
        }
    }
    """,
    Output("costly-graph", "figure"),
    Input("points", "value")
)


@app.server.route("/data/<int:points>", methods=["GET"])
def data_endpoint(points: int):
    figure_json = get_costly_figure(points)
    return make_response(figure_json, http.HTTPStatus.OK)


def get_costly_figure(points: int) -> str:
    key = f"some-key-depending-on-{points}-or-any-other-input-for-instance"
    figure_json = redis.get(key)

    if not figure_json:
        figure = produce_costly_figure(points)
        figure_json = plotly.io.to_json(figure)
        redis.set(key, figure_json)

    return figure_json


def produce_costly_figure(points: int) -> Figure:
    """Some function that takes some 500+ms to complete."""
    return px.scatter(x=np.random.random(points), y=np.random.random(points))


app.run_server()

Remember to get the Redis service running (e.g. with the docker run command in the commented line near the imports). The plots are generated randomly with the desired number of points, but if you repeat input values, the plots will repeat, since they are cached when first generated.

1 Like

I checked the docs page on client-side callbacks and nope, asynchronous client-side callbacks are not yet supported :frowning:

However, I think your solution might still be viable if the number of figures is small enough (read bounded by a small constant) that we can insert this exact number of dcc.Store components into the layout and use them as client-side staging areas for a variable number of JSON strings to be loaded into figures via a matching number of client-side callbacks. This would require some dynamic mapping (computed on page load, most reasonably) from dcc.Stores to figures to be passed to the callbacks.

In the case of a single figure, it would be straightforward and look like this (the again modified example):

import numpy as np
import plotly.io
import plotly.express as px
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
from dash.exceptions import PreventUpdate
from plotly.graph_objs import Figure
from redis import Redis

# docker run -p 6379:6379 redis
redis = Redis(host="0.0.0.0", port=6378)

app = Dash(__name__)
app.layout = html.Div([
    dcc.Input("points", type="number"),
    dcc.Store("figure", storage_type="session"),
    dcc.Graph("graph")
])


@app.callback(
    Output("figure", "data"),
    Input("points", "value")
)
def update_figure(points: int) -> str:
    if points is None:
        raise PreventUpdate()

    return get_costly_figure(points)


app.clientside_callback(
    """
    function update(figure_json) {
        if (figure_json !== undefined)
            return JSON.parse(figure_json);
        else
            return {};
    }
    """,
    Output("graph", "figure"),
    Input("figure", "data"),
)


def get_costly_figure(points: int) -> str:
    key = f"some-key-depending-on-{points}-or-any-other-input-for-instance"
    figure_json = redis.get(key)

    if not figure_json:
        figure = produce_costly_figure(points)
        figure_json = plotly.io.to_json(figure)
        redis.set(key, figure_json)
    else:
        figure_json = figure_json.decode()

    return figure_json


def produce_costly_figure(points: int) -> Figure:
    """Some function that takes some 500+ms to complete."""
    return px.scatter(x=np.random.random(points), y=np.random.random(points))


app.run_server(debug=True)

I’ll try to produce a MWE for the multi-figure case and post it here later, because I feel this is a solid use case that seems to currently have no other solution with the given constraints (large datasets, caching and responsiviness required).

Thanks for sharing this! You are absolutely right, async callbacks are not supported and it would be tough to workaround it in your case.

I am saving this thread for a later stage, I think this is a nice starting point for what I got to do next year…

Alright, I’ve finally got a decent workaround that solves all issues:

  • For each graph in the page, insert three components into the layout that can be pattern-matched:
    • A dcc.Graph
    • A hidden-typed dcc.Input (for storing the figure’s JSON string)
    • Another hidden-typed dcc.Input (see below)
  • Fetch the figures’ JSON strings using server-side pattern-matching callbacks
    • Cache the JSON strings on server side using preferred method to improve callback performance
    • The callback outputs JSON strings to the hidden inputs
    • The additional hidden inputs are meant to break the figure updates into several requests via MATCH, because using an ALL pattern-matched callback would concentrate all JSON transfers in a single response
    • Since the pattern-matching between hidden inputs involves objects that are dynamically created, you need to set suppress_callback_exceptions = True, otherwise you’ll get flaky callback exceptions due to non-existent components
  • Use a simple client-side pattern-matched callback to parse the JSON and update each graph’s figure property.

The final example is shown below. It is a simple application in which the user chooses the number of plots and the number of points per plot. Once a plot with a given number of points at a certain index is generated, it is cached on server side using Redis. Therefore, if you play with the number of plots but keep the number of points constant, you’ll see that graphs at a certain position are cached and therefore updated faster. If the number of points is changed, the graphs will be updated one at a time due to different generation costs.

import time
from typing import List

import numpy as np
import plotly.io
import plotly.express as px
from dash import Dash, dcc, html
from dash.dependencies import Input, Output, MATCH
from dash.exceptions import PreventUpdate
from plotly.graph_objs import Figure
from redis import Redis

FIGURE_LAYOUT = {"width": 300, "height": 250}

# docker run -p 6379:6379 redis
redis = Redis(host="0.0.0.0", port=6378)

app = Dash(__name__)
app.layout = html.Div([
    dcc.Input("num_plots", type="number", placeholder="Num. plots:"),
    dcc.Input("num_points", type="number", placeholder="Num. points:"),
    html.Div([], id="graphs-container"),
    html.Div([], id="figures-container"),
    html.Div([], id="signals-container"),
])


@app.callback(
    Output("graphs-container", "children"),
    Input("num_plots", "value")
)
def update_graphs(num_plots: int) -> List[dcc.Graph]:
    """Generates a number of empty figures according to user selection."""
    if num_plots is None:
        raise PreventUpdate()

    return [
        dcc.Graph(
            id={"type": "graph", "index": i},
            figure=Figure(layout=FIGURE_LAYOUT)
        )
        for i in range(num_plots)
    ]


@app.callback(
    Output("figures-container", "children"),
    Input("num_plots", "value")
)
def update_figure_inputs(num_plots: int) -> List[dcc.Input]:
    """Generates hidden inputs to hold the figures that are retrieved as JSON strings from the server."""
    if num_plots is None:
        raise PreventUpdate()

    return [
        dcc.Input(id={"type": "figure", "index": i}, type="hidden")
        for i in range(num_plots)
    ]


@app.callback(
    Output("signals-container", "children"),
    Input("num_plots", "value")
)
def update_signal_inputs(num_plots: int) -> List[dcc.Input]:
    """These additional inputs allow to pattern match against the JSON hidden inputs."""
    if num_plots is None:
        raise PreventUpdate()

    return [
        dcc.Input(id={"type": "signal", "index": i}, type="hidden", value=i)
        for i in range(num_plots)
    ]


@app.callback(
    Output({"type": "figure", "index": MATCH}, "value"),
    Input({"type": "signal", "index": MATCH}, "value"),
    Input("num_points", "value"),
)
def update_figure(index: int, points: int) -> str:
    """Each call to this function returns the JSON string for one figure. Results are cached for (index, points) pairs."""
    if (index is None) or (points is None) or (points == 0):
        return json.dumps({"layout": FIGURE_LAYOUT})
    else:
        return get_costly_figure(points, index)


app.clientside_callback(
    # This simple pattern-matching client-side callback does the magic of updating the graphs'
    # figure property with the parsed JSON. This is synchronous code, but likely always very fast.
    "(figure_json) => (figure_json !== undefined) ? JSON.parse(figure_json) : {}",
    Output({"type": "graph", "index": MATCH}, "figure"),
    Input({"type": "figure", "index": MATCH}, "value"),
)


def get_costly_figure(points: int, index: int) -> str:
    key = f"some-key-depending-on-{points}-and-{index}"
    figure_json = redis.get(key)

    if not figure_json:
        figure = produce_costly_figure(points)
        figure_json = plotly.io.to_json(figure)
        redis.set(key, figure_json)
    else:
        figure_json = figure_json.decode()

    return figure_json


def produce_costly_figure(points: int) -> Figure:
    """Some function that takes some >1s to complete."""
    time.sleep(np.random.randint(1, 3))
    return px.scatter(
        x=np.random.random(points),
        y=np.random.random(points)
    ).update_layout(**FIGURE_LAYOUT)


app.suppress_callback_exceptions = True
app.run_server(debug=True)

Thanks again for the feedback, @jlfsjunior, it was paramount to get this far!