How to persist column datatype when passing serialized dataframe across callbacks

Is there a way to avoid manually re-specifying the column type of a serialized dataframe when it’s been passed between callbacks?

The below is a simplified version of my actual use-case, but gets the point across. My specific question is whether there’s any way to avoid converting the ‘timestamp’ to datetime type in the resample_dataframe function?

import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output, State
import pandas as pd
import numpy as np

# Generate sample dataframe
np.random.seed(0)
date_rng = pd.date_range(start='2024-01-01', end='2024-01-02', freq='15T')
df = pd.DataFrame(date_rng, columns=['timestamp'])
df['a'] = np.random.randn(len(date_rng))
df['b'] = np.random.randn(len(date_rng))
df['c'] = np.random.randn(len(date_rng))
df['d'] = np.random.randn(len(date_rng))

# Create Dash app
app = dash.Dash(__name__)

app.layout = html.Div([
    html.Button('Run', id='run-button', n_clicks=0),
    dcc.Store(id='filtered-data', storage_type='memory'),
    html.Div(id='output-table')
])

# Callback to filter the dataframe
@app.callback(
    Output('filtered-data', 'data'),
    Input('run-button', 'n_clicks')
)
def filter_dataframe(n_clicks):
    if n_clicks > 0:
        filtered_df = df[df['a'] > 2]
        return filtered_df.to_dict('records')
    return None

# Callback to resample the filtered dataframe and display it
@app.callback(
    Output('output-table', 'children'),
    Input('filtered-data', 'data')
)
def resample_dataframe(data):
    if data:
        filtered_df = pd.DataFrame(data)
        filtered_df['timestamp'] = pd.to_datetime(filtered_df['timestamp'])
        resampled_df = filtered_df.resample('H', on='timestamp').mean().reset_index()
        return dash_table.DataTable(
            data=resampled_df.to_dict('records'),
            columns=[{'name': col, 'id': col} for col in resampled_df.columns]
        )
    return 'No data to display.'

if __name__ == '__main__':
    app.run_server(debug=True)

The way JSON (your Store component) stores data is not the same as Pandas, so unfortunately you need to manually specify the type every time that you load a Store component in Pandas.

What I usually do, instead than using pd.DataFrame(data) is to explicitly use pd.read_json() and manually specify the types,
something like this

pd.read_json(locations, orient='split', dtype={"id": str})

Another solution would be to the resampling operation (it seems this is the only reason why you have to read again the data into pandas…right?) before saving into JSON and then give this directly to DataTable, which I think should handle it without issues.

1 Like

Thanks for the guidance. Your proposed way is more elegant.

If I use pd.read_json(locations, orient='split', dtype={"id": str}), do I need to serialize the dataframe with df.to_json() as well, or can I leave it as `df.to_dict(‘records’) in the callback that outputs it?

Both ways are possible.
Whenever you’re interacting with a store component Python/Dash is going to read and write it as a list of dicts so it depends what you want to do with it. If you have a simple structure you don’t even need pandas to access it

I’m revisiting your answer here. Would this work for specifying datetime types though? The whole point is to specify a datetime type (as in my example) in one line - is that possible?

Hi @matsuobasho

When you save data in dcc.Store it needs to be in JSON format - which doesn’t support datetime.

You could try the server side store from Dash Extensions:
https://www.dash-extensions.com/transforms/serverside_output_transform