Populating Dash AG Grid rowData with 10,00 rows is slow

PyGuy · April 16, 2024, 3:11pm

I currently have an AG Grid and about 30,000 rows of data. The data is broken out into three years (about 10,000 rows per year). Rather than loading everything into the AG Grid I have a dropdown that lets users select the year they want to view. I am storing the pandas.DataFrame server side in a dcc.Store . Users can select the year they want from a dropdown and update the rowData of the AG Grid but doing df.to_dict('records') for the 10,000 rows from the server to client can be slow.

@app.callback(Output('datatable', 'rowData'),
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    df = df[df['Year'].eq(year)]
    return df.to_dict('records')

Any tips to increase performance? I am not using the enterprise version of AG Grid but should I consider reconfiguring everything to use the infinite row model? The downside of the infinite row model is sorting and filtering since the table does not store all the data. I do have graphs that are updated when the table is filtered.

I am not opposed to using the infinite row model if that is going to be the best option but I just wanted to ask from some advice here before I reconfigure my update_graphs callbacks for the infinite row model.

jinnyzor · April 16, 2024, 4:02pm

Hello @PyGuy,

If you use infinite you will lose the ability to use virtualRowData.

What happens if you cache the year_dropdown using a basic cache from flask, does that increase your response?

Also, you have to keep in mind, if the data is a dcc.Store, it also has to get transferred.

If this is all clientside, you could convert this to a clientside callback:

yearDropdown(year, df) {
  const filteredData = df.filter(row => row.Year === year);
  return filteredData.map(row => ({ ...row }));
}

PyGuy · April 16, 2024, 4:13pm

@jinnyzor I did try that but because I am using DashProxy from dash extensions so I can save the DataFrame server-side, using flask-caching errors out with IndexError: list index out of range

Here is the exact example from Dash Extension just with the addition of flask-caching used on the update_dd callback

import plotly.express as px
from dash_extensions.enrich import DashProxy, Output, Input, State, Serverside, html, dcc, \
    ServersideOutputTransform
from flask_caching import Cache

app = DashProxy(transforms=[ServersideOutputTransform()])

cache = Cache(app.server, config={
    'CACHE_TYPE': 'FileSystemCache',
    'CACHE_DIR': 'file_system_backend'
})

app.layout = html.Div(
    [
        html.Button("Query data", id="btn"),
        dcc.Dropdown(id="dd"),
        dcc.Graph(id="graph"),
        dcc.Loading(dcc.Store(id="store"), fullscreen=True, type="dot"),
    ]
)


@app.callback(Output("store", "data"),
              Input("btn", "n_clicks"),
              prevent_initial_call=True)
def query_data(n_clicks):
    return Serverside(px.data.gapminder())  # no JSON serialization here


@app.callback(Output("dd", "options"),
              Output("dd", "value"),
              Input("store", "data"),
              prevent_initial_call=True)
@cache.memoize(timeout=60*60)  # FLASK CACHING
def update_dd(df):
    options = [{"label": column, "value": column} for column in df["year"]]  # no JSON de-serialization here
    return options, options[0]['value']


@app.callback(Output("graph", "figure"),
              [Input("dd", "value"),
               State("store", "data")],
              prevent_initial_call=True)
def update_graph(value, df):
    df = df.query("year == {}".format(value))  # no JSON de-serialization here
    return px.sunburst(df, path=["continent", "country"], values="pop", color="lifeExp", hover_data=["iso_alpha"])


if __name__ == "__main__":
    app.run(debug=True)

jinnyzor · April 16, 2024, 4:16pm

What about this:

@cache.memoize() # shouldnt change per df and year dropdown
def year_rowData(year, df):
    df = df[df['Year'].eq(year)]
    return df.to_dict('records')

@app.callback(Output('datatable', 'rowData'),
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    return year_rowData(year, df)

PyGuy · April 16, 2024, 4:19pm

@jinnyzor Thanks! I did not even think of creating a separate function that is cached to be use in the callback. I will give this a try but it looks promising!

PyGuy · April 16, 2024, 5:00pm

@jinnyzor thanks, that did help! The first load is about the same (makes sense) but after the first load the callback is about 2x faster.

jinnyzor · April 16, 2024, 5:01pm

If you wanted, and the data doesnt change. You could preload the cache upon app spin up.

For even quicker processing, you could also introduce caching on the clientside, (if the data doesnt change) for the different year selections.

PyGuy · April 16, 2024, 9:00pm

The data gets updated every night via a couple api calls, manipulated and stored on the server in a parquet file.

I played around with it and storing data client side and using a client-side callback is by far the fastest but storing the 30,000 rows client-side takes about 15 seconds. So I think I would rather have users wait the 5 seconds when using the year dropdown before it gets cached and then wait the 2 seconds for the cached data to load.

Thanks again for the help @jinnyzor

jinnyzor · April 16, 2024, 9:14pm

You should check out my page-caching and see if you can retrofit it for this, that should go down to microseconds once the data has been loaded.

PyGuy · April 18, 2024, 11:21pm

This may help someone in the future but I actually ended up compressing df.to_dict('records'), passing it to a dcc.Store and using a clientside_callback to update the AG Grid. The rowData now gets updated in 500ms.

@cache.memoize() 
def year_rowData(year, df):
    df = df[df['Year'].eq(year)]
    compressed_row_data = base64.b64encode(zlib.compress(simplejson.dumps(df.to_dict('records'), ignore_nan=True)
                                                     .encode('utf-8'))).decode('utf-8')

    return compressed_row_data


@app.callback(Output('rowDataCompressed', 'data'),  # dcc.Store(id='rowDataCompressed')
              Input('year-dropdown', 'value'),
              Input('data-all', 'data'),
              prevent_inital_call=True)
def year_dropdown(year, df):
    return year_rowData(year, df)


clientside_callback(
    """
        function(data) {
            let compressedData = Uint8Array.from(atob(data), (c) => c.charCodeAt(0));
            let decompressedData = pako.inflate(compressedData, { to: "string" });
            let jsonObject = JSON.parse(decompressedData);
            return jsonObject;
        }
    """,
    Output('datatable', 'rowData'),
    Input('rowDataCompressed', 'data'),
    prevent_initial_call=True
)

Topic		Replies	Views
Large file display in dash aggrid Dash Python	6	807	November 10, 2023
serverSide rowModelType in Dash AgGrid by using snowflake data Dash Python	4	304	January 20, 2024
Dash AG Grid - Infinite Row Model - Updating startRow and endRow Dash Python question	20	1860	June 29, 2023
AG Grid vs Datatable? Dash Python question	11	5844	August 7, 2023
Update data in infinite row model Dash Python question	1	106	May 15, 2024

Populating Dash AG Grid rowData with 10,00 rows is slow

Related topics