@app.callback improvements? (Trigger component, same Output multiple times, callback without Output, serverside store / callback cache)

Hi all,

@Emil posted recently interesting post about Server Side Caching using component from dash-extensions. In the same package, there are two other interesting classes: Trigger and CallbackGrouper. Here are some random thoughts which came into my mind. Perhaps we could utilize these components to improve the @app.callback() a bit?

  1. The Trigger component is nice. This could also be added to dash.dependencies, so the callback syntax could be (option A)
@app.callback(
   [*output],
   [*inputs, *triggers],
   [*states]
)

where Trigger type of inputs are just not passed over to the callback function. Or even (option B)

@app.callback(
   [*outputs],
   [*inputs],
   [*states],
   [*triggers]
)

And since it could be possible to just check the type of app.callback(*args) arguments, it could be even (option C)

@app.callback(
   *outputs,
   *inputs,
   *states,
   *triggers
)

and the ordering of Output, Input, State and Trigger components could be any actually, but they (Input and State) would be passed to the callback function in the same order as they were given in @app.callback. Similarly with outputs.

  1. There is this CallbackGrouper, which makes it possible to assign multiple callbacks to same Output. I think this would be good the merge to the dash core, if it works in all situations. The @app.callback() could be made to accept same Output multiple times, and it could function like CallbackGrouper under the hood.

    Gain 1: This would mean that there would be one less rule to remember about callbacks, and make dash more easier for beginners.

    Gain 2: Much easier to define callbacks when you have same Output used in multiple callbacks

    Gain 3: There is also another neat thing that could be accomplished with this. You could (probably, did not try this but I think it should be possible) allow callbacks without Output. How is that? Either, have always a hidden div in dash application with id='null-output' or something like that (or have it enabled with a kwarg given to Dash()), and then if user writes

    @app.callback(
            None,
           [*inputs],
     )
    def func():
          do_something()
    

    the None output would be directed to this null-output, or a PreventUpdate Exception would be raised.

    with the new syntax (option C) this could be even

    @app.callback(
           Input(...),
           Input(...),
           Trigger(...),
           State(...),
     )
    def func():
          do_something()
    

    (note the missing Output). This would be a really clear syntax for callback definition.

What do you think? (Not sure who to tag here, but @chriddyp, @Marc-Andre, others?)

1 Like

One other idea for Trigger component is to take the functionality to dash.dependencies.Input with a keyword argument. Having Input('some-component', 'value', trigger=True), where the trigger argument could define if the argument is going to be passed to the callback at all. It could also be is_trigger, pass_to_func, etc. This could help to remove some unnecessary clutter from the callback function definitions, maybe?

I think this is a really interesting discussion. Here are my immediate thoughts,

  • In terms of inputs-not-included-in-function-signature, I think that a Trigger object would be nice. Itā€™s shorter than passing a flag to Input, and IMO more intuitive.

  • I would love the option to specify no outputs. I find the syntax where the output argument is simply omitted to be the most clear.

I could write a wrapper that fixed both of these points pretty easily, but a ā€˜properā€™ fix in the actual code would be preferable :blush:

I have played around a little bit, and Iā€™ve done a preliminary implementation of the following syntax,

  1. The Trigger component can be used for all callbacks
  2. The caching is applied per Output via keyword arguments to the Output object itself
  3. The order of the args (Input, Output, State, Trigger) does not matter and they can all be lists or single elements
  4. Callbacks with no Output are allowed

I choose to inject the cache as a keyword to the Output for two main reasons,

  • It makes it possible to mix cached and non-cached outputs in a callback
  • It enables the use of different caches (e.g. a file cache and a in memory cache) for different callbacks; or even for different outputs of the same callback

Here is a piece of code that demonstrated the new syntax,

import time
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px

from flask_caching.backends import FileSystemCache
from dash_extensions.transformers import Dash, Output, Input, Trigger


# Create app.
app = Dash(prevent_initial_callbacks=True)
app.layout = html.Div([
    html.Button("Query data", id="btn"), dcc.Dropdown(id="dd"), dcc.Graph(id="graph"), html.Div(id="log"),
    dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot")
])
# Create (server side) cache. Works with any flask caching backend.
fsc = FileSystemCache(cache_dir="cache")


@app.callback([Output("store", "data", cache=fsc), Output("log", "children")],  # Mix of cached/non-cached Outputs
              Trigger("btn", "n_clicks"))  # Trigger instead of input
def query_data():
    time.sleep(1)
    return px.data.gapminder(), "Query completed"


@app.callback(Input("store", "data"), Output("dd", "options"))   # Inverted Input/Output order, Input not as list
def update_dd(df):
    return [{"label": column, "value": column} for column in df["year"]]


@app.callback(Input("store", "data"))  # No output
def update_dd(df):
    print(df.head())


@app.callback(Output("graph", "figure"), [Input("store", "data"), Input("dd", "value")])
def update_graph(df, value):
    df = df.query("year == {}".format(value))
    return px.sunburst(df, path=['continent', 'country'], values='pop', color='lifeExp', hover_data=['iso_alpha'])


if __name__ == '__main__':
    app.run_server()

It should run with dash-extensions==0.0.25rc3. Any suggestions for improvements, @fohrloop? I was thinking about changing the name of the ā€œcacheā€ keyword argument to make it more clear to the user that the data is kept server side. Other options could be ā€œserver_cacheā€? ā€œserver_side_cacheā€? ā€œserver_onlyā€? ā€œkeep_on_serverā€? ā€œstore_on_serverā€?

2 Likes

Nice job! :tada:

Trigger & null Output

  • Really nice that Trigger can be now used for all callbacks! Also callbacks with Output are asked here so many times, and it makes perfectly sense to enable that functionality!
  • I feel that the Trigger and null Output are kind of pair: Trigger is ā€œinput without inputā€ and ā€œnull Outputā€ is, well, output without output. I think both of them are very welcomed functionalities to the dash ecosystem!

More relaxed argument syntax for @app.callback

  • Big thumbs up for this! Less things to remember = :+1:t2: :+1:t2: :+1:t2:

imports

I see you created a new Dash class, too! That is handy! The imports from dash are currently

from dash import Dash
from dash.dependencies import Output, Input, State

so I was wondering, would similar imports be easier to memorize:

from dash_extensions import Dash
from dash_extensions.dependencies import Output, Input, Trigger, State

? Of course, also the Output etc. could be just importable from dash_extensions root, if they are always imported with Dash (one less row in imports). This is matter of taste.

Comments on the ā€œcacheā€

This is something I have been thinking. Caching in my head means that something computationally expensive is skipped if called with same arguments. Now, if we think about this, it is actually more like a server side version of dcc.Store, if I understand this correctly, isnā€™t it? So, maybe it could be named like server_store. Or, maybe I am wrong and it should have the name ā€œcacheā€ in it, since it uses a FileSystemCache.

Or should it be actually implemented as a new component?

For example, ServerStore, which would work like this:

from dash_extensions import ServerStore

app.layout = html.Div([
    html.Button("Query data", id="btn"), dcc.Dropdown(id="dd"), dcc.Graph(id="graph"), html.Div(id="log"),
    ServerStore(id='mystorage'), # Add Store component, like dcc.Store. It could store the "ID"/key for the storage, for example.  
])

@app.callback([Output("mystorage", "data"), Output("log", "children")],  
              Trigger("btn", "n_clicks"))  
def query_data():
    time.sleep(1)
    return px.data.gapminder(), "Query completed"

Now the ServerStore could also take the cache as keyword argument, but if user does not pass anything, it would default to FileSystemCache. I donā€™t know if there are technical difficulties implementing this but at least the naming convention and usage would be the similar as it has been with the dcc.Store, and it would be pretty clear from the name what it actually does. Any thoughts on this?

Really really good stuff @fohrloop & @Emil! Yes indeed, I think there are a few great ideas here that could make their way over to dash. Experimenting with the API, discussing things here, and publishing these ideas as dash_extensions is the best way push these ideas forward - I am watching this issue!

I donā€™t want to get in the way of your exploration, but here are a few thoughts & ideas:


Re Server Cache / Store:

One tricky thing about placing the ServerStore / ServerCache in app.layout is that it implies that something is getting passed up into the front-end. It blurs the lines between server & client for users that understand Dashā€™s architecture and data transport. Some ideas:

  • Could it be an alternative Output? Something like:
@app.callback(SessionStore('upload-store', 'data'), Input('upload_component', 'data'))
def ...

@app.callback(Output('graph', 'figure'), Input('upload-store', 'data'))
def ..

We might need to modify some things in Dashā€™s backend to make that possibleā€¦ but here Iā€™m just considering what an optimal API might look like.


Re ā€œOption Cā€: app.callback(*outputs, *inputs, *state). This is a great idea, and weā€™re actually going forward with it in Single input by mbegel Ā· Pull Request #1180 Ā· plotly/dash Ā· GitHub


Re Trigger - We like it!


Re null Output - I like it and I like @fohrloopā€™s description of how they are sort of a ā€œpairā€. I havenā€™t discussed this with the broader Dash engineering team yet.


Re multiple callbacks to the same component. Weā€™ve discussed this before and we have two issues with it:

  • It becomes ambiguous when there are overlapping inputs
  • It can make some larger apps more complex to reason about because weā€™re no longer dealing with a DAG
  • It would make Dashā€™s own code more complex because it would no longer be a DAG

Regarding DAG: Think ā€œExcelā€ - Each cell has a single formula. Multiple outputs in Dash would be equivalent to Excel where multiple formulas in different places could update a single cell.

That being said, perhaps we can make callback_context simpler to deal with for these types of control flows with multiple Trigger or button Inputs. Perhaps triggered_map would be easier to deal with:

@app.callback(Output(...), Input('left', 'n_clicks'), Trigger('right', 'n_clicks'))
def update_output(left_nclicks, right_nclicks):
    if 'left.n_clicks' in ctx.triggered_map:
          # do something
    elif 'right.n_clicks' in ctx.triggered_map:

Or maybe just Trigger:

@app.callback(Output(...), Trigger('left', 'n_clicks'), Trigger('right', 'n_clicks'))
def update_output(left, right):
    if left:
          # do something
    elif right:
          # do something else
2 Likes

Re Server Cache /Store

I see, perhaps adding to app.layout could cause some confusion, even though there are some use cases for components in the layout that are not rendered. One possibility would be to introduce a concept of ServerOutput and ServerInput that contains data which is never sent to frontend.

For example something like this

from dash_extensions import Dash
from dash_extensions.dependencies import ServerInput, ServerOutput

app = Dash(__name__, serverstore='mystore')

@app.callback(ServerOutput('mystore'), Input('upload_component', 'data'))
def first_function(data):
     # do stuff
     return df # is saved to server cache called 'mystore'. Session ID or similar used as part of the "key"

@app.callback(Output('graph', 'figure'), ServerInput('mystore'))
def second_function(df):
    fig = create_fig(df)
    return fig

This would eventually give possibility to have ā€œServer side callbacksā€ (well, partly). When using ServerInput and ServerOutput between callbacks, there will be no need for communication between browser and server. This would be a great performance boost!

Thanks for the comments, @chriddyp and @fohrloop!

Haha, i never thought about it like that. But i like the phrasing :slight_smile:

Yes, i ended up moving from ā€œblueprint approachā€ (constructing block of callbacks a registering them on the app) to a more tight integration with the Dash object via a custom subclass (thatā€™s what you are importing). One immediate concern with the

from dash_extensions import Dash

syntax is that custom components (e.g. the Download component) are currently placed in dash_extensions root. Wouldnā€™t it be weird to place the Dash object alongside the custom components?

Yes! And per default (instant_refresh=True) there is no actual caching (even though a cache is used for storing the data). I think that you are right that the naming convention should change completely. Maybe it could be

fsc = FileSystemCache(cache_dir="cache")
app.layout = html.Div([
    ...
    ServerStore(id='mystorage',  cache=False, backend=fsc), 
])

where the cache argument would control if the backend is actually used as a cache, i.e. in terms of my current syntax cache=False corresponds to instant_refresh=True and vice versa.

Strictly, something is passed to the front-end, with the something being the data key (currently an md5 hash). Since the key uniquely identifies the data on the server, one could argue that a ServerStore wouldnā€™t compromise with the dash concept of keeping all state client side.

This thought is (at least from an implementation point of view), very close to my current approach (a keyword to Output). At the time of writing i am or more of less split between this approach (creating a new type of output, i.e. a ServerOuput) and the previous one (creating a new type of store, i.e. a ServerStore).

While a real multiple-callbacks-to-the-same-component solution would be great, i like the idea of adding some syntax sugar for now. Another options could be a group keyword to the callback, i.e. something like

@app.callback(Output(...), Trigger('left', 'n_clicks'), group="agroup")
def left_click():
  # do something

@app.callback(Output(...), Trigger('right', 'n_clicks'), group="agroup")
def right_click():
  # do something

All callbacks with the same group argument would then be merged automatically into one thereby avoiding the multiple output error. In terms of readability, i like this approach better. However, it might be harder to debug as compared to your more explicit approach.

I have a few concerns about adding a ServerInput (in addition to the ServerOutput). The user will need to remember when to use ServerInput versus Input, and ServerInput is longer than Input to write. Furthermore, what happens if you use ServerInput for a non-server output? Or Input for a server output? The key advantage of ServerInput, as i see it, is that you make it clear from the code that the input is read server side.

1 Like

Re ServerInput vs ServerStore

What I have been thinking that with concept of ServerInput and ServerOutput, it would be possible not to send anything to browser between cached callbacks. Currently the situation looks with (client-side) dcc.Store something like this:

image

By using a ServerStore component, and storing the ID to browser, this is how it looks:

image

But what if it would be possible to short-circuit server-side callbacks; callbacks that have only server-side inputs or outputs (ServerInput, ServerOutput):

image

You can imagine what kind of speed increase this would give, when there are chained callbacks and / or slow internet connection. This kind of change might need also changes to dash core.

What ifs

  • When callback has Input with ServerOutput: Take the session ID from a cookie to store something to server. Callback has no output back to client
  • When callback has ServerInput and Output (but no Input): That kind of callback makes sense only when chained with something that has Input with ServerOutput. (Not possible to trigger otherwise)

Re group keyword for Output?

What would be then difference of

@app.callback(Output('my-output', 'val'), Trigger('left', 'n_clicks'), group="agroup")
def left_click():
  # do something

@app.callback(Output('my-output', 'val'), Trigger('right', 'n_clicks'), group="agroup")
def right_click():
  # do something

and

@app.callback(Output('my-output', 'val'), Trigger('left', 'n_clicks'))
def left_click():
  # do something

@app.callback(Output('my-output', 'val'), Trigger('right', 'n_clicks'))
def right_click():
  # do something

? Could the first one be easier to implement? I assume in many cases the group would be common when there are same Output in the @app.callback, anyway?

From a theoretical point of view, i can see where you are going with the ServerInput concept. However, i have two main concerns,

  • As you note, this short-circuit concept is restricted to callbacks the have only ServerInputs. How often will this actually happen? Do you have any examples? In most cases, i guess you will need inputs from the client.
  • I am not sure how large a speed increase you would actually enable as compared to just keeping the data on server (as i do now). Yes, you would save a round trip of requests, but as you indicate in your drawings, they payload of these request is small, so they donā€™t take much time anyway. In example you are quoting, almost all of the time is spent is the last callback with the large 1.8 MB payload. And this block will not be affected by the ServerInput concept.

The reason to introduce the group keyword is that i donā€™t wanā€™t to group callbacks targeting the same Output automatically (among other things, circular dependencies can arise in the grouping process, which would yield hard-to-debug bugs). When the Dash team figures out a way to enable this functionality ā€œnativelyā€, the latter syntax that you propose (without the group keyword) would definitly be the way to go. However, until this happens, i figured the ā€œgroupā€ keyword could be the next-best-thing.

The simple example is what people seem to want to do (for example this):

  • Press a button
  • Calculate dataframe df server side, based on some selection in frontend
  • Save that df into temporary memory my-memory
  • Have multiple callbacks listening my-memory. Letā€™s say, there are 4 Graphs created and shown to user from 4 different callbacks.

In this case, when multiple callbacks listen to the my-memory, there will be normally many unnecessary HTTP requests for the server and sending (data with dcc.Store, session ID with ServerStore) back and forth. This all could, in theory, be short-circuited in such way that the session ID is only sent to the server once.

On local machine, perhaps quite negligible speed increase, but if you put your app to a slow cheap server (free tier heroku), I guess the speed gain should be visible. But yeah, some benchmarking and testing could be :ok_hand:t2: before implementation, to make sure there is really notable potential performance increase.

If the graphs does not require any user inputs, why not just put them in the same callback as the calculation itself?

Benchmarking is the way forward :+1:. My gut feeling is that in most cases the speed gain will be negligible, but i would love to be proved wrong :slight_smile:

Something to keep in mind here - we still need to pass the cache UID around via the client, we canā€™t just keep it in the server memory. This is because Dash is designed to be stateless and there may be many duplicate Dash backends serving requests and in the case of kubernetes, these backends might not even be in the same data center! So they can all speak to the same shared data store (like Redis) but passing the cache uid around via the client is likely the easiest way to provide access.

I have now (0.0.25rc4) implemented the grouping functionality also. Hence, you can now do this

import dash_html_components as html
from dash.dependencies import Output
from dash_extensions.callback import Dash, Trigger

app = Dash(prevent_initial_callbacks=True)
app.layout = html.Div([
    html.Button("Left", id="left"), html.Button("Right", id="right"), html.Div(id="log"),
])

@app.callback(Output("log", "children"), Trigger("left", "n_clicks"), group="lr")
def left():
    return "left"

@app.callback(Output("log", "children"), Trigger("right", "n_clicks"), group="lr")
def right():
    return "right"

if __name__ == '__main__':
    app.run_server()

I have also changed the syntax for the keep-data-on-server functionality based on the discussion above. Itā€™s now available via a new ServersideOutput. Hence the previous example will now look like this,

import time
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
from dash.dependencies import Output, Input
from dash_extensions.callback import Dash, Trigger, ServersideOutput

app = Dash(prevent_initial_callbacks=True)
app.layout = html.Div([
    html.Button("Query data", id="btn"), dcc.Dropdown(id="dd"), dcc.Graph(id="graph"),
    dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot")
])

@app.callback(ServersideOutput("store", "data"), Trigger("btn", "n_clicks"))
def query_data():
    time.sleep(1)
    return px.data.gapminder()

@app.callback(Input("store", "data"), Output("dd", "options"))
def update_dd(df):
    return [{"label": column, "value": column} for column in df["year"]]

@app.callback(Output("graph", "figure"), [Input("store", "data"), Input("dd", "value")])
def update_graph(df, value):
    df = df.query("year == {}".format(value))
    return px.sunburst(df, path=['continent', 'country'], values='pop', color='lifeExp', hover_data=['iso_alpha'])

if __name__ == '__main__':
    app.run_server()

I figured that the name ServersideOutput is more explanatory than just ServerOutput, and itā€™s also coherent with respect to the naming convention of the clientside_callback.

The ServersideOutput object has optional arguments cache (enables caching if set to true, defaults to false), backend (defaults to a FileSystemStore with cache_dir=./file_system_store ), and session_check (includes session key in cache key generation if true, defaults to true).

2 Likes

Wow thats nice :tada: :clap:t2:

The syntax is now pretty clear and goes well with the Dash ecosystem!

1 Like

Very nice!

Iā€™m having a hard time wrapping my head around what the cache argument does. Does this skip running the callback in the inputs are the same? ie memoize? If so, iā€™d expect that to be an argument of the callback itself rather than of ServersideOutput

1 Like

Yes, itā€™s kind of like memoize (i should probable rename the argument to memoize). So, as you already know, the flow is like this (with cache=False),

  1. Based on (Inputs, State) and a few other things (e.g. session id, function name) a key is generated
  2. The callback is invoked, and the result is inserted into the backend (a cache) with the key from (1)
  3. The key is returned to the client

Now, if you put cache=True, step (2) is skipped if the key already exists in the backend cache. I guess you could achieve the same behavior using a memoize decorator, but in that case, you would end up storing the callback output twice (once in the backend cache of the ServersideOutput, and once the memoize backend). I feel that this would be a waste of cache space, which is the incentive for creating an interface that lets the user decide, if the ServersideOutput backend should also act as a cache.

The reason i put the argument on ServersideOutput (and not the callback itself) is that i am currently only storing the ServersideOutput values in the cache. Hence if you have a callback with a both a ServersideOutput and a normal Output, setting cache=True on the ServersideOutput would do nothing as the normal Output would always require reevaluating the callback. Writing this, i can see that the behavoir is not very intuitive, so i guess the design still needs some adjustmentā€¦

Oh, could it be that the fact there was two outputs made the callcack to be called each time, regardless of the instant_refresh value in this test? Or did I somehow understand wrong how it should work?

Based on your feedback, it seems clear to me that the memoize argument should be at the callback level. However, this makes it a bit unclear what to expect, when you mix Outputs and ServersideOutputs. One solution to this problem would be simply to restrict the output of a callback to be either all Outputs or all ServersideOutput. Do you think this is a good approach?

If so, one option would be to raise an exception, if a callback has both Outputs and ServersideOutputs. Alternatively, we should skip the ServersideOutput object all together and indicate that a callback targets serverside outputs via a keywork argument,

@app.callback([Output("store", "data")], Trigger("btn", "n_clicks"), serverside_output=True) 
def query_data():

This syntax could also be extended to pass arguments (such as memoize) on to the underlying store. Hence, a memoized callback would be

@app.callback([Output("store", "data")], Trigger("btn", "n_clicks"), serverside_output=dict(memoize=True)) 
def query_data():

This other arguments should still be at the Output level, i.e. like this

fss = FileSystemStore()
...
@app.callback([Output("store", "data", backend=fss, session_check=False)], Trigger("btn", "n_clicks"), serverside_output=dict(memoize=True)) 
def query_data():

At first glance, it seems like we could perform caching at a callback level in a way that would apply to the Output as well. basically, itā€™d be a shorthand for adding the @flask_caching.memoize decorator

1 Like