Black Lives Matter. Please consider donating to Black Girls Code today.

Show and Tell - Server Side Caching

Great analysis! I love your diagram in particular :blush:. I’ll take it as the starting point of my answer. Let’s denote the arrows (1,2,3,4). To sum up what happens,

  1. Client sends button click (small).
  2. Client receives the full data, i.e. around 6.7 MB is this example.
  3. Client sends the full data, 6.7 MB.
  4. Client receives the figure, around 1.8 MB.

As you note, (1) is fast. Due to the large payload (6.7 MB), both (2) and (3) are slow. And since the data are not used for anything by the client itself, this data transfer is in fact unnecessary. The figure transfer in (4) is also rather slow (as the figure is large), but unlike (2) and (3) it is necessary. Since the figure is rendered client side, without sending the figure JSON to the client, the client would not know what to draw.

The caching mechanism targets the unnecessary transfers (2, 3), but it cannot do anything about (4). Hence based on your results, I would say that the cache works as intended.

You can use flask cache directly to avoid reevaluating the function multiple time, but it won’t save you the data round trip, which is the key point of the CallbackCache.

It would not make sense to use the cached callback per default. As noted in my previous post, it only makes sense to use the cache for callbacks that return data, which is not used by the client.

It might be more intuitive to use the same callback decorator for all callbacks and add a cache keyword argument. However, I think this argument should take a cache object as input rather than a Boolean. This would make it possible to use different caches for different callbacks, e.g a disk cache for large data blocks and a memory cache for smaller ones.

1 Like

Regarding to this, I posted my thoughts on naming/default behaviour of a “cache” to the separate thread.

Based on inputs from @np8 and @chriddyp, i have come up with a new syntax (available in dash-extensions 0.0.28). The performance should be the same, but the syntax is simpler (at least that is the intention). Here is the benchmark example using the new syntax,

import datetime
import dash_core_components as dcc
import dash_html_components as html
import numpy as np
import pandas as pd

from dash_extensions.enrich import Dash, ServersideOutput, Output, Input, State, Trigger

# Drop down options.
options = [{"label": x, "value": x} for x in [1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]]
# Create app.
app = Dash(prevent_initial_callbacks=True)
server = app.server
app.layout = html.Div([
    html.Button("Run benchmark (with cache)", id="btn"), dcc.Dropdown(id="dd", options=options, value=1),
    dcc.Store(id="time"), dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot"), html.Div(id="log")
])


@app.callback([ServersideOutput("store", "data"), ServersideOutput("time", "data")],
              Trigger("btn", "n_clicks"), State("dd", "value"))
def query(value):
    df = pd.DataFrame(data=np.random.rand(int(value)), columns=["rnd"])
    return df, datetime.datetime.now()


@app.callback(Output("log", "children"), Input("store", "data"), State("time", "data"))
def calc(df, time):
    toc = datetime.datetime.now()
    mean = df["rnd"].mean()
    return "ELAPSED = {}s (and mean is {:.3f})".format((toc - time).total_seconds(), mean)


if __name__ == '__main__':
    app.run_server()

So what has changed? Instead of having to register the callbacks on the Dash app object, you now just have to use the custom objects from dash_extensions.enrich. The cached_callback decorator has been abandoned. You now just use the normal callback decorator and indicate which outputs should stay server side by using ServersideOutput instead of Output.

3 Likes

Hi @Emil,

Thank you so much for all the work you’ve put into this and this thread - it’s been super helpful in getting my Dash app running MUCH faster (locally), but I can’t get it to work with a server. I’m new to Dash and brand new to deployment, and we’re running into some issues - I’d love some advice if you think you know what’s up.

Like in your example, I pass in a dataframe and use the ServersideOutput function to output a dataframe. My code looks pretty much identical in structure to the example you have above, except that prevent_initial_callbacks=False - nothing loads if I set it to True. After some initial errors, we figured out that we needed to change the file path for the file_system_store directory in enrich.py due to permissions issues with creating a new directory while the app is running.

After that, I’m stumped. It looks like the data is getting lost during the serverside callback - the callbacks that it inputs to are throwing NoneType errors, and the plots aren’t loading at all in the app. Did I break something by changing the cache directory path? Any tips for debugging this and figuring out how to get the serverside data to link up to these callbacks? I just don’t know how to troubleshoot the serverside component.

Here’s an example of the error message I’m getting:

Exception on /_dash-update-component [POST], referer: (server link here)
Traceback (most recent call last):, referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2447, in wsgi_app, referer: (server link here)
    response = self.full_dispatch_request(), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1952, in full_dispatch_request, referer: (server link here)
    rv = self.handle_user_exception(e), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1821, in handle_user_exception, referer: (server link here)
    reraise(exc_type, exc_value, tb), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise, referer: (server link here)
    raise value, referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1950, in full_dispatch_request, referer: (server link here)
    rv = self.dispatch_request(), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1936, in dispatch_request, referer: (server link here)
    return self.view_functions[rule.endpoint](**req.view_args), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/dash/dash.py", line 1032, in dispatch, referer: (server link here)
    response.set_data(func(*args, outputs_list=outputs_list)), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/dash/dash.py", line 967, in add_context, referer: (server link here)
     output_value = func(*args, **kwargs)  # %% callback invoked %%, referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/dash_extensions/enrich.py", line 342, in decorated_function, referer: (server link here)
    return f(*args), referer: (server link here)
  File "/var/www/FlaskApp/FlaskApp/__init__.py", line 370, in fig2, referer: (server link here)
    df_pivot_max = pd.pivot_table(df, index='EXAMPLE INDEX', values=['VAL1', 'VAL2'], aggfunc={'VAL1':'count', 'VAL2':'max'}), referer: (server link here)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/pivot.py", line 76, in pivot_table, referer: (server link here)
    if i not in data:, referer: (server link here)
 TypeError: argument of type 'NoneType' is not iterable, referer: (server link here)

Thanks for any help you can give!!

To change the cache path, you should create a new FileSystemStore backend, i.e. something like this,

from dash_extensions.enrich import Dash, FileSystemStore

output_defaults=dict(backend=FileSystemStore(cache_dir="some_path"), session_check=True)
app = Dash(output_defaults=output_defaults)

As an initial debugging step, you could check if files are in fact being written to "some_path". If not, it would indicate that you are still having permission issues.

Success! Thanks for the tip - and this very cool package!

1 Like

Hi @Emil,

First, great job!!

I notice that in this version, you import dash from dash_extensions.enrich, and you use app=Dash(…) Instead of app=dash.Dash(…)

When I do the same, my app stops displaying / my callbacks don’t fire correctly anymore. Could you enlighten me on the reason for this change and if there are any alternatives? Thanks!

Thanks! Did you remember to import the Input, Output and State objects from enrich also? And what versions are you using?

@Emil Yes I have!

my versions:

dcc: 1.10.2
html: 1.0.3
dash_extensions: 0.0.31

Not even the app layout displays when I use app=Dash(...) instead of dash.Dash(...).

Also, is there a reason why you use prevent_initial_callbacks?

PS: When the callbacks don’t fire correctly (prevent_initial_callbacks=False), I actually get JSON serialization errors and the layout still doesn’t show

I tend to use prevent_initial_callbacks as initial callbacks with None values often needs special handling (which seems unnecessary when you can just use the `prevent_initial_callbacks flag). Hmm, i don’ t see why the callbacks shouldn’t work. Do you get any error, or are the callbacks just not fireing?

After inspecting in the browser, the callbacks do fire but nothing is displayed. It might only have to do with the interaction with the layout. I have tried without using a css template but the problem persists.

Is there any way I can still use your module while importing dash.Dash()? It seems like I’m able to use ServerSideOutput without it, and callbacks still fire and the layout does show.

If you use the ServerSideOutput object with a standard Dash object it just does nothing. Just to be sure, you are using the ServerSideOutput for Store objects only, right?

EDIT: If you could create a small, self-contained example, i could take a look at what goes wrong.

Yes, only for Store objects. So… Output isn’t server side in this case? my app works perfectly well with ServerSideOutput, but I haven’t benchmarked it in production. I guess I’ll have to troubleshoot the app from the start once I get a little time.

Also, one thing that might have an effect is that I am using a multipage app with a flat project layout. I’ll keep you updated on my progress, and you keep us all updated on yours!!

EDIT: I’ll definitely do that ASAP.
Thanks!

No, unless you use the Dash object from enrich, it will remain client side. It’s the Dash object from enrich that performs the “magic”. If you only need the ServerSideOutput feature, you could try disabling the other features. The syntax would be something like,

fs = FileSystemStore(cache_dir="path_that_you_can_write_to")
sot = ServersideOutputTransform(backend=fs)
app = DashTransformer(transforms=[sot])

where the app variable corresponds to the normal Dash object. Another thing to note is that it’s important that you have write permission to the directory to which the cache is written. Per default, it’s a folder created next to the app, but you can change it as per the code above if needed.

I do have write permissions, but sadly when using this method the callbacks don’t even fire anymore and I get this error:

⛑️ A callback is missing Outputs
Please provide an output for this callback:
{
  "clientside_function": null,
  "inputs": [
    {
      "id": "url",
      "property": "pathname"
    }
  ],
  "output": "....",
  "prevent_initial_call": false,
  "state": [],
  "outputs": [
    {
      "id": "",
      "property": "",
      "out": true
    }
  ]
}

I’ll write a self-contained example next week. Thanks, I really appreciate it

Wow. This is great. Thank you, @Emil

I have some of the examples working well, but when trying to adapt a larger script I’m given the following error that I can’t quite understand:

Traceback (most recent call last):
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/derricklewis/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 1867, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 1945, in full_dispatch_request
    self.try_trigger_before_first_request_functions()
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions
    func()
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/dash_extensions/enrich.py", line 82, in _setup_server
    super()._setup_server()
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/dash/dash.py", line 1089, in _setup_server
    _validate.validate_layout(self.layout, self._layout_value())
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/dash_extensions/enrich.py", line 69, in _layout_value
    layout = transform.layout(layout, self._layout_is_function)
  File "/Users/derrick/anaconda3/envs/dash/lib/python3.6/site-packages/dash_extensions/enrich.py", line 451, in layout
    children = layout.children + self.hidden_divs
TypeError: unsupported operand type(s) for +: 'Container' and 'list'

I’ll try to truncate the code I’m using:

from datetime import datetime, timedelta
import numpy as np
import pandas as pd
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from flask import Flask
import dash_table
import dash_html_components as html
import dash_core_components as dcc
import dash_bootstrap_components as dbc
import dash_auth
from dash_extensions.enrich import Dash, ServersideOutput, Output, Input, State, Trigger, FileSystemStore

output_defaults = dict(backend=FileSystemStore(
    cache_dir="./some_path"), session_check=True)

df = pd.read_csv('posts.csv')

server = Flask(__name__)

app = Dash(name=__name__,
           prevent_initial_callbacks=True,
           server=server,
           output_defaults=output_defaults,
           external_stylesheets=[dbc.themes.GRID])


app.layout = html.Div(
    dbc.Container([
        dcc.Loading(id='loading_icon', children=[
            dbc.Row([
                dbc.Col([
                    dcc.Graph(
                        id='main_chart',

                    )
                ])
            ]),
            dcc.Store(id='filter_df'),
            dcc.Store(id='agg_df'),
            dcc.Store(id='user_df')
        ],
            type='default'
        ),
])

@app.callback(
              [ServersideOutput("filter_df", "data"),
               ServersideOutput("agg_df", "data"),
               ServersideOutput("user_df", "data")
               ],
              [Input('submit_button', 'n_clicks')],
              [State('region_dropdown', 'value'),
               State('category_dropdown', 'value'),
               State('type_dropdown', 'value'),
               State('sponsored', 'value'),
               State('follower_slider', 'value'),
               State('username_input_field', 'value')]
              )
def get_benchmark_data(clicks, region, category, type, sponsored, followers, user_target):
    #do some expensive calculations

    return filter_df, agg_df. user_df


@app.callback(
    Output('main_chart', 'figure'),
    [Input('type_dropdown', 'value'),
     Input('metric_dropdown', 'value'),
     Input('filter_df', 'data'),
     Input('agg_df', 'data'),
     Input('user_df', 'data')],
    [State('username_input_field', 'value')
     ]
)
def update_main_chart(type, metric, filter_df, agg_df, user_df, username):
    #build a figure

    return fig

if __name__ == '__main__':
    app.run_server()

Is there another way to use the dbc.Container component?

Using:
dash-extensions = 0.0.31
dash-html-components = 1.0.3
dash-core-components = 1.10.1

Thanks in advance if anyone has any thoughts to share.

I think I have a solution.

I can either add an array to the html.div:

app.layout = html.Div([
    dbc.Container([
        dcc.Loading(id='loading_icon', children=[

Or remove the div altogether.

app.layout = dbc.Container([
        dcc.Loading(id='loading_icon', children=[

One more question. Please let me know if this is the wrong topic.

I’m noticing the cache file getting big quickly. Is there a way to limit the cache collected with ServersideOutput?

I can see in the plotly docs on flask-caching something like this:

cache = Cache(app.server, config={
    'CACHE_TYPE': 'filesystem',
    'CACHE_DIR': 'cache-directory',
    # should be equal to maximum number of users on the app at a single time
    # higher numbers will store more data in the filesystem / redis cache
    'CACHE_THRESHOLD': 200}