Manipulation of containers (list, dict, ..) in Dash

The typical pattern used for manipulation of data containers in Dash, e.g. a list append operation, is to include the current container data via a State element, perform the operation(s), and return the resulting data structure. Hence, something like this,

@app.callback(Output("list", "data"), Input("button", "n_clicks"), State("item", "value"), State("list", "data"))
def append_to_list(n_clicks, item, current_list):
    new_list = current_list + [item]
    return new_list

There are two things I don’t like about this approach. First, it is extremely inefficient (especially for large amounts of data), as all of the data is passed from the client to the server, and back again. Second, I find the syntax rather cumbersome. I have therefore started working on an interface for performing container operations in Dash. The previous example using my work-in-progress syntax looks like this,

@app.callback(ListOutput("list", "data"), Input("button", "n_clicks"), State("item", "value"))
def append_to_list(n_clicks, item):
    return ListOutput.append(item)

Note that the complete data is not passed as State, so the round trip of the (potentially heavy) current data is avoided. Furthermore, I find the syntax more clear. What do you think? Can you come up with a better one? :slight_smile:

@chriddyp @snehilvj @AnnMarieW

EDIT: Besides list and dict, are there other data structures that you think should be included?

Using the common demo case of a shopping list, here is a small (but complete) example app that demonstrates the new syntax in a bit more detail,

from dash_extensions.enrich import ContainerTransform, html, DashProxy, ListOutput, Trigger, State, TriggerTransform
import dash_mantine_components as dmc

app = DashProxy(transforms=[ContainerTransform(), TriggerTransform()], prevent_initial_callbacks=True)
app.layout = html.Div([
    dmc.TextInput(id="item", label="Item", value="Cheese"),
    dmc.NumberInput(id="index", label="Pop index", value=1),
    html.Button("Append", id="append"), html.Button("Pop", id="pop"), html.Button("Clear", id="clear"),
    dmc.Title("Shopping list"), dmc.List([], id="shopping_list")
])


@app.callback(ListOutput("shopping_list", "children"), Trigger("append", "n_clicks"), State("item", "value"))
def append_to_shopping_list(item):
    return ListOutput.append(dmc.ListItem(item, id=item))


@app.callback(ListOutput("shopping_list", "children"), Trigger("pop", "n_clicks"), State("index", "value"))
def pop_from_shopping_list(index):
    return ListOutput.pop(index)


@app.callback(ListOutput("shopping_list", "children"), Trigger("clear", "n_clicks"))
def clear_shopping_list():
    return ListOutput.clear()


if __name__ == '__main__':
    app.run_server()

Animation

Hey @Emil, this looks awesome :star_struck:

I especially like how it can improve performance with large datasets.

A couple questions:

  • Can this handle nested data structures? ie list of lists list of dicts etc?
  • I see this example is for ListOutput() will there also be one for DictOutput()? Is it necessary to have two different types of Outputs?
  • Will it work with pattern matching callbacks?

As far as other types of outputs, I expect lists and dicts would be the most common use-case.

@AnnMarieW Thanks! :smiley:

  • The current implementation doesn’t handle nesting. But I think it should be possible to extend the implementation to handle arbitrary nesting without too much trouble
  • Yes! I have already implemented that one (using the exact names you suggested). I don’t see any technical issues with merging them into one type of output, so I think that’s more a matter of taste. Do you have a good name for the list-or-dict -output? :slight_smile:
  • The current implementation doesn’t support pattern matching callbacks. I think it should be possible to extend the implementation to the ALL wildcard, but I don’t think it will be possible to handle the MATCH wildcard (for the same reasons that the MultiplexerTransform doesn’t support MATCH)

Great comments/questions :+1:

Naming is the hardest part. If this functionality made it’s way into dash, I think having only one new callback dependency type would be more beginner friendly. However, that’s less of an issue if it stays in dash-extensions.

I’ll keep thinking about a name, but nothing obvious comes to mind. OutputExtension() ? :woman_shrugging:

This is awesome, @emil! Really psyched you’re exploring this.

OK here are lots of scattered thoughts. Sorry for the length… if I had more time I would’ve written a shorter letter :slight_smile:


Functional Data Transformations

One API idea I was excited about at one point was having inputs & outputs be functional data transformations. So you could write stuff like:

@callback(Output('figure', extend('figure', 'data.0.y')), Input('interval', 'n_interval'))
def update(_):
    df = get_latest_data()
    return df['sales']

which would extend the figure.data[0].y property of the figure.

So then the question becomes, what grammar do we support beyond extend?

With the nested property, you could also have this as inputs, so:

@callback(Output('figure', 'data.*.marker.color'),
          Input('darken', 'n_clicks'),
          State('figure', 'data.*.marker.color'))
def update(existing_colors):
    new_colors = []
    for color in existing_colors:
        new_colors.append(darken(color))
    return new_colors

* might not be generic enough. We might consider looking at the jq syntax for a more generic data accessing grammar.


Functional Grammar - Inspired by Ramda

At one point, I was thinking we could expose/adopt the Ramda API, which basically allows you to do any sort of functional transformations to data in a single expression: Ramda Documentation

The cool thing about ramda, and other functional paradigms, is that you construct the data transformation expression in a way that you can always “call”/“apply” with a value. So instead of having

my_list.append(5)

you write:

append(my_list)(5)

or in a more complex scenario:

extend(figure['data'][0]['x'], [1, 2, 3])

you write something like

concat(lensPath('data', 0, 'x'))(figure)([1, 2, 3])

(See in Rambda sandbox)

which lends itself well to what we’re doing where you write the data transformation expression in the callback and then “call”/“apply” with two things: 1. The component’s property, 2) callback’s return value. So generically, Dash’s front-end “simply” runs:

new_component_value = expression(property_value)(callback_output)
expression = extend(lensPath('data', 0, 'x'))
property_value = figure
callback_output = [1, 2, 3]

new_component_value = expression(property_value)(callback_output)

And the simple case of Output('my-graph', 'figure') becomes “shorthand” for some simple operation like Output('my-graph', 'set(figure)')

Someone even wrote a Python Ramda, which could be a nice tool for debugging these expressions.

Ramda, and other functional data transformation systems like this, are nice because you can easily serialize them (it’s just one big expression) and execute them in JS. And enough thought has been put into their grammar to basically allow for any possible data expression within a single, perhaps very nested, command.

(Here’s a great read on these functional paradigms: Mostly adequate guide to functional programming)


Which Transformations?

Ramda may be too abstract. And we probably want some shorthands especially for simple accessors like data.0.x (lensPath will probably scare people away!).

The main things I can think of folks needing would be:

  • Single value accessors: Being able to target any single part of a data structure
  • Wildcard accessors: Being able to target patterns of a data structure, like figure.data.*.marker.color.
  • Lists: append, extend, replace slice, access slice
  • Strings: set, regex replace, concacentate, suffix, prefix
  • Dictionaries: set single value, merge, replace dictionary
  • Numbers: set, math operations?
  • List of dictionaries (e.g. list of records in data in datatable): Extract a column from a list of dictionaries, aka [row['x'] for row in data]. This can be nicely expressed in rambda via pluck: Ramda Documentation

Ah, the state machine

The beauty of most Dash apps is that they are completely defined by the current set of inputs on the page. If you opened up the DAG and fired all of the callbacks with the current set of inputs you’d get the same outputs every single time.

When we start introducing appending and extending transformations, the output can be defined based off of the number of times the transformation has been called - The current output is sort of implicitly the “State” that is being applied on. This isn’t a bad thing, but it’s just kinda an interesting framework to think about things.

Now we already have this model in Dash with State and applying the data transformations in a callback. All we’re doing is basically providing some formalism (and way better performance!) around things like this:

@callback(Output('mygraph', 'figure'), Input('button', 'n_clicks'), State('mygraph', 'figure'))
def update(_):
    figure['data'][0]['y'].extend(get_data())
    return figure

which could be written now as e.g.

@callback(Output('mygraph', extend('figure.data.0.y'), Input('button', 'n_clicks'))
def update(_):
    return get_data()

Initial State

Related to above, one complexity is around initial state. If we have callbacks that are focussed on transforming things, then how does a user set the original state of the property? Seems like setting it as part of the layout is the way to go. Either in app.layout or whatever callback returned the component in the first place. So:

app.layout = html.Div([
    html.Button(id='button', n_clicks=0),
    dcc.Graph(id='mygraph', figure=px.line(df, x='time', y='price', color='stock'))
])

@callback(Output('mygraph', extend('figure.data.0.y'), Input('button', 'n_clicks'))
def update(_):
    return get_data()

Or even:

app.layout = html.Div([
    html.Button(id='display', n_clicks=0),
    html.Div(id='content')
])

@callback(Output('content', 'children'), Input('display', 'n_clicks'))
def update(_):
    return html.Div([
        html.Button(id='button', n_clicks=0),
        dcc.Graph(id='mygraph', figure=px.line(df, x='time', y='price', color='stock'))
    ])

@callback(Output('mygraph', extend('figure.data.0.y'), Input('button', 'n_clicks'))
def update(_):
    return get_data()

Resetting

Similar to above, what about reseting & transforming the property? Two callbacks?

app.layout = html.Div([
    html.Button('Refresh data', id='refresh-button', n_clicks=0),
    dcc.Dropdown(['a', 'b', 'c'], id='dropdown'),
    dcc.Graph(id='mygraph')
])

@callback(
    Output('mygraph', 'figure'), 
    Input('dropdown', 'value')
)
def update(_, value):
    return px.scatter(get_latest_data(value), x='time', y='price')
    
@callback(
    Output('mygraph', extend('figure', 'data.0.x')),
    Output('mygraph', extend('figure', 'data.0.y')),
    Input('refresh-data', 'n_clicks'),
    State('dropdown', 'value')
)
def update(_, value):
    df = get_latest_data(value)
    return [df['x'], df['y']]

If it’s multiple callbacks, then what’s the resolution order for the DAG? Can we do some rule like “first call the callback without the transformations, and then call the callback with the transformations?” Feels a little ugly.

An alternative syntax would be combining into a single callback, either with the data transformations in the Output or returning a transformation.

Transformations in the callback:

app.layout = html.Div([
    html.Button('Refresh data', id='refresh-button', n_clicks=0),
    dcc.Dropdown(['a', 'b', 'c'], id='dropdown'),
    dcc.Graph(id='mygraph')
])

@callback(
    Output('my-graph', 'figure'),
    Output('mygraph', extend('figure', 'data.0.x')),
    Output('mygraph', extend('figure', 'data.0.y')),
    Input('refresh-data', 'n_clicks'),
    State('dropdown', 'value')
)
def update(_, value):
    df = get_latest_data(value)
    if ctx.triggered == 'dropdown.value':
        return [px.scatter(df, x='time', y='price'), no_update, no_update]
    else:
        return [no_update, df['x'], df['y']]

Or, returning transformations:

app.layout = html.Div([
    html.Button('Refresh data', id='refresh-button', n_clicks=0),
    dcc.Dropdown(['a', 'b', 'c'], id='dropdown'),
    dcc.Graph(id='mygraph')
])

@callback(
    Output('my-graph', 'figure'),
    Input('refresh-data', 'n_clicks'),
    State('dropdown', 'value')
)
def update(_, value):
    df = get_latest_data(value)
    if ctx.triggered == 'dropdown.value':
        return px.scatter(df, x='time', y='price')
    else:
        return [
            extend('data.0.x', df['x']),
            extend('data.0.y', df['y'])
        ]

Transformations in layout

In the Dash model, anything that you can return in an output is something that you can set in the layout. So if we allow returning transformations in a callback, then you should be able to set them in the layout.

But this is a little weird because transformations depend on a value already being defined (you’re transforming something!). In the previous example, that order-dependent logic is defined within the callback. But I think this can be defined functionally too with like a “default” or “if is none” transformation.

app.layout = html.Div([
    dcc.Graph(figure=
        ifNone(
            default=px.scatter(df, x='time', y='price'),
            else=extend('data.0.y', df['y'])
        )
    )
])

This isn’t that useful on it’s own, but imagine if you could also reference other properties on the page within this transformations. Then you could define clientside data transformations between components without callbacks which is pretty neat.

app.layout = html.Div([
    dcc.Store(id='store', data=df.to_dict('records')),
    dcc.Graph(figure={
        'data': [{
            'x': get('store.*.date'),
            'y': get('store.*.price'),
        }]
    })
])

This is similiar to a prototype I wrote a few years ago in Dash Clientside Transformations by chriddyp · Pull Request #142 · plotly/dash-renderer · GitHub (note the ramda example!) and discussed recently in Improving on All-in-One (AIO) components - #9 by chriddyp


3 Likes

@chriddyp Thank you for some great input! I have a few comments,

  • While I like the simplicity of the proposed syntax,
@callback(Output('figure', extend('figure', 'data.0.y')), ...)

it seems rather limiting to me. As I read it, you will only be table to perform a particular operation on a particular sub element (please correct me if I am wrong). What about something like,

@callback(Output('figure', 'data'), Input('interval', 'n_interval'))
def update(_):
    lo = ListOperator()
    lo[0]['y'] = get_latest_data()['sales']
    return lo.apply()

this kind of syntax would make it possible to perform different operations on different parts of the data object, more similar to your Ramda example(s)

  • In principle, I really like the idea of enabling Ramda. It is obviously extremely powerful. However, I fear that it might scare off a lot of users, as the syntax is very different from what they are used to (in Python). That’s why I am (at least for now) targeting an interface that more closely mimics common Python data structures (i.e. dict and list)

I deliberately didn’t dive into your ideas about partial data accessors and in-layout transformation. I’ll have to think a bit more about these points :slight_smile:

2 Likes

I have made a new iteration to enable (1) access to nested properties using a one-line syntax and (2) to enable both list and dict operations through the same callback,

from dash_extensions.enrich import OperatorOutput, Input, callback, Operator

# single operation
@callback(OperatorOutput('figure', 'data'), Input('btn', 'n_clicks'))
def update(_):
    return Operator()['key'][3].list.append("stuff").apply()

# multiple operations (option 1)
@callback(OperatorOutput('figure', 'data'), Input('btn', 'n_clicks'))
def update(_):
    return Operator()['key'][3].list.append("stuff").sort().apply()

# multiple operations (option 2)
@callback(OperatorOutput('figure', 'data'), Input('btn', 'n_clicks'))
def update(_):
    op = Operator()
    op[0]['y'].list.append("item")
    op[0]['y'].list.sort()
    return op.apply()

You can use any kind of accessor (both list and dict, i.e. number or string) on the Operator object to access nested properties (i.e. you can drill down any number of lists/dicts). Next, you specify which kind of data structure you are operation on (i.e. list or dict) via dot notation (that’s partially to enable auto completion), and finally the operation that you want to perform (again, via dot notation).

2 Likes