How to persist column datatype when passing serialized dataframe across callbacks

matsuobasho · June 11, 2024, 8:56pm

Is there a way to avoid manually re-specifying the column type of a serialized dataframe when it’s been passed between callbacks?

The below is a simplified version of my actual use-case, but gets the point across. My specific question is whether there’s any way to avoid converting the ‘timestamp’ to datetime type in the resample_dataframe function?

import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output, State
import pandas as pd
import numpy as np

# Generate sample dataframe
np.random.seed(0)
date_rng = pd.date_range(start='2024-01-01', end='2024-01-02', freq='15T')
df = pd.DataFrame(date_rng, columns=['timestamp'])
df['a'] = np.random.randn(len(date_rng))
df['b'] = np.random.randn(len(date_rng))
df['c'] = np.random.randn(len(date_rng))
df['d'] = np.random.randn(len(date_rng))

# Create Dash app
app = dash.Dash(__name__)

app.layout = html.Div([
    html.Button('Run', id='run-button', n_clicks=0),
    dcc.Store(id='filtered-data', storage_type='memory'),
    html.Div(id='output-table')
])

# Callback to filter the dataframe
@app.callback(
    Output('filtered-data', 'data'),
    Input('run-button', 'n_clicks')
)
def filter_dataframe(n_clicks):
    if n_clicks > 0:
        filtered_df = df[df['a'] > 2]
        return filtered_df.to_dict('records')
    return None

# Callback to resample the filtered dataframe and display it
@app.callback(
    Output('output-table', 'children'),
    Input('filtered-data', 'data')
)
def resample_dataframe(data):
    if data:
        filtered_df = pd.DataFrame(data)
        filtered_df['timestamp'] = pd.to_datetime(filtered_df['timestamp'])
        resampled_df = filtered_df.resample('H', on='timestamp').mean().reset_index()
        return dash_table.DataTable(
            data=resampled_df.to_dict('records'),
            columns=[{'name': col, 'id': col} for col in resampled_df.columns]
        )
    return 'No data to display.'

if __name__ == '__main__':
    app.run_server(debug=True)

guidocioni · June 12, 2024, 10:54am

The way JSON (your Store component) stores data is not the same as Pandas, so unfortunately you need to manually specify the type every time that you load a Store component in Pandas.

What I usually do, instead than using pd.DataFrame(data) is to explicitly use pd.read_json() and manually specify the types,
something like this

pd.read_json(locations, orient='split', dtype={"id": str})

Another solution would be to the resampling operation (it seems this is the only reason why you have to read again the data into pandas…right?) before saving into JSON and then give this directly to DataTable, which I think should handle it without issues.

matsuobasho · June 14, 2024, 12:20am

Thanks for the guidance. Your proposed way is more elegant.

If I use pd.read_json(locations, orient='split', dtype={"id": str}), do I need to serialize the dataframe with df.to_json() as well, or can I leave it as `df.to_dict(‘records’) in the callback that outputs it?

guidocioni · June 14, 2024, 6:54am

Both ways are possible.
Whenever you’re interacting with a store component Python/Dash is going to read and write it as a list of dicts so it depends what you want to do with it. If you have a simple structure you don’t even need pandas to access it

matsuobasho · June 25, 2024, 8:47pm

I’m revisiting your answer here. Would this work for specifying datetime types though? The whole point is to specify a datetime type (as in my example) in one line - is that possible?

AnnMarieW · June 25, 2024, 9:49pm

Hi @matsuobasho

When you save data in dcc.Store it needs to be in JSON format - which doesn’t support datetime.

You could try the server side store from Dash Extensions:
https://www.dash-extensions.com/transforms/serverside_output_transform

Topic		Replies	Views
Dash DataTable DateTime Dash Python	2	1969	August 16, 2019
Problem of creating dynamic dash-table based on categories column Dash Python	0	727	August 6, 2019
Rows x columns is not JSON serializable? Dash Python	4	5734	November 16, 2017
Storing python dictionnary of DataFrames Dash Python	7	1000	September 16, 2021
Adding new row to dash datatable changes dhe dtype of columns Dash Python question	1	336	August 30, 2022

How to persist column datatype when passing serialized dataframe across callbacks

Related topics