Populating Dash AG Grid rowData with 10,00 rows is slow

I currently have an AG Grid and about 30,000 rows of data. The data is broken out into three years (about 10,000 rows per year). Rather than loading everything into the AG Grid I have a dropdown that lets users select the year they want to view. I am storing the pandas.DataFrame server side in a dcc.Store . Users can select the year they want from a dropdown and update the rowData of the AG Grid but doing df.to_dict('records') for the 10,000 rows from the server to client can be slow.

@app.callback(Output('datatable', 'rowData'),
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    df = df[df['Year'].eq(year)]
    return df.to_dict('records')

Any tips to increase performance? I am not using the enterprise version of AG Grid but should I consider reconfiguring everything to use the infinite row model? The downside of the infinite row model is sorting and filtering since the table does not store all the data. I do have graphs that are updated when the table is filtered.

I am not opposed to using the infinite row model if that is going to be the best option but I just wanted to ask from some advice here before I reconfigure my update_graphs callbacks for the infinite row model.

Hello @PyGuy,

If you use infinite you will lose the ability to use virtualRowData.

What happens if you cache the year_dropdown using a basic cache from flask, does that increase your response?


Also, you have to keep in mind, if the data is a dcc.Store, it also has to get transferred.

If this is all clientside, you could convert this to a clientside callback:

yearDropdown(year, df) {
  const filteredData = df.filter(row => row.Year === year);
  return filteredData.map(row => ({ ...row }));
}

@jinnyzor I did try that but because I am using DashProxy from dash extensions so I can save the DataFrame server-side, using flask-caching errors out with IndexError: list index out of range

Here is the exact example from Dash Extension just with the addition of flask-caching used on the update_dd callback

import plotly.express as px
from dash_extensions.enrich import DashProxy, Output, Input, State, Serverside, html, dcc, \
    ServersideOutputTransform
from flask_caching import Cache

app = DashProxy(transforms=[ServersideOutputTransform()])

cache = Cache(app.server, config={
    'CACHE_TYPE': 'FileSystemCache',
    'CACHE_DIR': 'file_system_backend'
})

app.layout = html.Div(
    [
        html.Button("Query data", id="btn"),
        dcc.Dropdown(id="dd"),
        dcc.Graph(id="graph"),
        dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot"),
    ]
)


@app.callback(Output("store", "data"),
              Input("btn", "n_clicks"),
              prevent_initial_call=True)
def query_data(n_clicks):
    return Serverside(px.data.gapminder())  # no JSON serialization here


@app.callback(Output("dd", "options"),
              Output("dd", "value"),
              Input("store", "data"),
              prevent_initial_call=True)
@cache.memoize(timeout=60*60)  # FLASK CACHING
def update_dd(df):
    options = [{"label": column, "value": column} for column in df["year"]]  # no JSON de-serialization here
    return options, options[0]['value']


@app.callback(Output("graph", "figure"),
              [Input("dd", "value"),
               State("store", "data")],
              prevent_initial_call=True)
def update_graph(value, df):
    df = df.query("year == {}".format(value))  # no JSON de-serialization here
    return px.sunburst(df, path=["continent", "country"], values="pop", color="lifeExp", hover_data=["iso_alpha"])


if __name__ == "__main__":
    app.run(debug=True)

What about this:

@cache.memoize() # shouldnt change per df and year dropdown
def year_rowData(year, df):
    df = df[df['Year'].eq(year)]
    return df.to_dict('records')

@app.callback(Output('datatable', 'rowData'),
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    return year_rowData(year, df)
1 Like

@jinnyzor Thanks! I did not even think of creating a separate function that is cached to be use in the callback. I will give this a try but it looks promising!

@jinnyzor thanks, that did help! The first load is about the same (makes sense) but after the first load the callback is about 2x faster.

If you wanted, and the data doesnt change. You could preload the cache upon app spin up. :wink:


For even quicker processing, you could also introduce caching on the clientside, (if the data doesnt change) for the different year selections.

2 Likes

The data gets updated every night via a couple api calls, manipulated and stored on the server in a parquet file.

I played around with it and storing data client side and using a client-side callback is by far the fastest but storing the 30,000 rows client-side takes about 15 seconds. So I think I would rather have users wait the 5 seconds when using the year dropdown before it gets cached and then wait the 2 seconds for the cached data to load.

Thanks again for the help @jinnyzor

You should check out my page-caching and see if you can retrofit it for this, that should go down to microseconds once the data has been loaded. :slight_smile:

This may help someone in the future but I actually ended up compressing df.to_dict('records'), passing it to a dcc.Store and using a clientside_callback to update the AG Grid. The rowData now gets updated in 500ms.

@cache.memoize() 
def year_rowData(year, df):
    df = df[df['Year'].eq(year)]
    compressed_row_data = base64.b64encode(zlib.compress(simplejson.dumps(df.to_dict('records'), ignore_nan=True)
                                                     .encode('utf-8'))).decode('utf-8')

    return compressed_row_data


@app.callback(Output('rowDataCompressed', 'data'),  # dcc.Store(id='rowDataCompressed')
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    return year_rowData(year, df)


clientside_callback(
    """
        function(data) {
            let compressedData = Uint8Array.from(atob(data), (c) => c.charCodeAt(0));
            let decompressedData = pako.inflate(compressedData, { to: "string" });
            let jsonObject = JSON.parse(decompressedData);
            return jsonObject;
        }
    """,
    Output('datatable', 'rowData'),
    Input('rowDataCompressed', 'data'),
    prevent_initial_call=True
)
1 Like