Sharing a dataframe between plots

I have a dashboard that works as follows:

  1. User selects an item from a dropdown menu
  2. A database is queried to pull the data (as a dataframe) based on the users’s choice
  3. Three different plots are created using the same dataframe
  4. When the user selects a different item, steps 2 and 3 are rerun

Now my problem is, I am querying the database with exactly the same query to pull the data for all three app.callback decorators corresponding to the three plots I have. This is very inefficient and I want to figure out a way to query the database once and then share the dataframe between the decorators.

I am thinking of defining the dataframe as global but I feel there may be a cleaner solution.
Any pointer will be appreciated.

6 Likes

See https://github.com/plotly/dash/issues/49#issuecomment-311511286 for now. In particular,

global_df = pd.read_csv('...')
app.layout = html.Div([
    dcc.Graph(id='graph'), 
    html.Table(id='table'),
    dcc.Dropdown(id='dropdown'),
    html.Div(id='intermediate-value', style={'display': 'none'})
])

@app.callback(Output('intermediate-value', 'children'), [Input('dropdown', 'value')])
def clean_data(value):
     # some expensive clean data step
     cleaned_df = your_expensive_clean_or_compute_step(value)
     return cleaned_df.to_json() # or, more generally, json.dumps(cleaned_df)

@app.callback(Output('graph', 'figure'), [Input('intermediate-value', 'children'])
def update_graph(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    figure = create_figure(dff) 
    return figure

@app.callback(Output('table', 'children'), [Input('intermediate-value', 'children'])
def update_table(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    table = create_table(dff) 
    return table
9 Likes

Thanks you @chriddyp.
It worked like charm.

I’ve pulled some of this discussion into a new chapter of the Dash user guide: https://plot.ly/dash/sharing-data-between-callbacks (source here: https://github.com/plotly/dash-docs/blob/master/tutorial/sharing_state.py).

4 Likes

The part in example 2 where you return multiple dfs on their own keys in dictionaries doesn’t seem to work. When we are using a df as the key it complains about “DataFrame is mutable. Cannot be hashed”. If you set the key to be a string, it errors out with a “component.type is undefined” error.

@cvax -

Thanks for reporting @cvax! Yes, this is a typo, it is supposed to be

    return {
        'df_1': df_1.to_json(orient='split'),
        'df_2': df_2.to_json(orient='split'),
        'df_3': df_3.to_json(orient='split'),
    }

(keys are supposed to be strings). Fixed in Fix typo (thanks @cvax!) · plotly/dash-docs@b033b8d · GitHub

Hi Chris. Slightly confused about this part. It seems that you clean the data with a callback, then store it. Why not just clean the data when the page loads? Would there be any problems with that? Example:

global_df = pd.read_csv(’…’)
cleaned_df=some_expensive_step(global_df)

@app.callback(Output(‘graph’, ‘figure’), [Input(‘dropdown’, ‘value’])
def update_graph(val):
dff = cleaned_df[cleaned_df[‘col’]==val]
figure = create_figure(dff)
return figure

No, there isn’t. This is only if you need to filter the data as response to a callback. See Part 4. Sharing Data Between Callbacks | Dash for Python Documentation | Plotly for full context

1 Like

How does this work when building a multipage app? I must be missing something obvious here, because everytime I navigate to another page, all the data inside my hidden Div (which is part of the layout in my “index” app) is gone. Any help will be much appreciated.

1 Like

Also interested in how to share data across pages…

Simple use case would be pulling data from an API and being able to use it across multiple pages as opposed to having to hit the API again on every page load.

1 Like

I had trouble with this approach. I have a multiple page app that is trying to use the shared data to build the list of options dynamically.

  1. Defined the functions to convert list to string and vice versa. To keep it the same way as the document specified, they are converted via dataframe.
def to_jsonstr(my_list):
    df = pd.DataFrame(my_list)
    print("convert list to string")
    print(df)
    jsonStr = df.to_json(orient='split')
    print('jsonStr is {}'.format(type(jsonStr)))
    return jsonStr


def from_jsonstr(jsonList):
    print("convert dataframe to list")
    print('jsonList is {}'.format(type(jsonList)))
    df = pd.read_json(jsonList, orient='split')
    print(df)
    my_list = df[0].values.tolist()
    print(my_list)
    return my_list
  1. Two callbacks are added. add_watchlist is called when “add to watch list” button is triggered, which add the new symbol to the existing list. update_dropdown is called when the “intermediate” new-watchlist" string is updated.

app = dash.Dash()

app.config.suppress_callback_exceptions = True

app.layout = html.Div(children=[
    # global hidden value
    html.Div(id='new-watchlist'),

    html.H1(children="My Stock App"),
    dcc.Location(id='url', refresh=False),
    html.Div(id='page-content')

])


# Add Ticker to New watchlist when add button is clicked


@app.callback(Output('new-watchlist', 'children'),
              [Input('add-button', 'n_clicks')],
              [State('ticker-input', 'value'),
               State('new-watchlist', 'children')]
              )
def add_watchlist(n_clicks, ticker, jsonWatchList):
    print('json Watchlist is ')
    print(jsonWatchList)

    # watchList = ['AMZN', 'GOOG', 'TSLA']
    # print(watchList)
    if(jsonWatchList is None):
        watchList = ['AMZN', 'GOOG', 'TSLA']
    else:
        watchList = from_jsonstr(jsonWatchList)
        print('watchList after json loads')
        print(watchList)
        if ticker is not None and ticker not in watchList:
            watchList.append(ticker)
        print("after add")
        print(watchList)
    outJson = to_jsonstr(watchList)
    print("outJson is")
    print(outJson)
    return outJson

# Update Options in dropdown-watchlist


@app.callback(Output('dropdown-watchlist', 'options'),
              [Input('new-watchlist', 'children')]
              )
def update_dropdown(jsonWatchList):
    watchList = from_jsonstr(jsonWatchList)
    print("in update_dropdown")
    print(watchList)
    return [{'label': i, 'value': i} for i in watchList]

I got stuck in the callback function update_dropdown was only triggered once during page load, but not when the list is changed (I can see from the view that new-watchlist is updated, but the callback is not triggered). Could someone please help?
new-watchlist is updated now to {“columns”:[0],“index”:[0,1,2,3],“data”:[[“AMZN”],[“GOOG”],[“TSLA”],[“C”]]}
image
but update_dropdown callback was not triggered by this change.
image

A more detailed log of things I’ve tried so far

Hi, I was able to implement this, however I am running into a limitation it seems:

I am creating a DataFrame in a json in a div as described. I use this div df as an input to create the limits of a slider. I then use the selection of the slider as well as the div df as inputs to simply display a value from the df. I get an issue in that the slider does not appear.

This may also be due to the fact that I am also using ID’s of components generated from dynamic callbacks at the same time. However I’m not sure why this would be an issue?

Please see this simplified example:

app.config.supress_callback_exceptions = True  # to use ID's from passback generated components
    
app.layout = html.Div([
        html.Div(id='filtered-data', style={'display': 'none'}),  #DF holding Div

   html.Div(id='map-slider-keeper'),   #Div holder for slider that will be passed back based on limits of filtered DF

   html.H4(id='output-map-slider-value'). #Display selected value from slider

])

# Callback for intermediate df

@app.callback(Output('filtered-data', 'children'),
              [Input('date-picker', 'start_date'),
               Input('date-picker', 'end_date')])
def filter_data(startdate, enddate):
        dfc = df[(df['date'] > startdate) & (df['date'] < enddate)]
        return dfc.to_json() 

#Callback for slider:

@app.callback(Output('map-slider-keeper', 'children'),
              [Input('filtered-data', 'children')]) 
def update_map_slider(df):
    dfc = pd.read_json(df)
    return dcc.Slider(
        id='map-slider',
        min=0,
        max=dfc.shape[0] - 1,
        step=1,
        value=2,
        updatemode='drag'
    )

# callback for value display: when I add this part in, things fail

@app.callback(Output('output-map-slider-value', 'children'),
              [Input('filtered-data', 'children'),
               Input('map-slider','value')])
def MapSliderText( df , idx ): 
    dfc = pd.read_json(df)
    date = (dfc['date'][idx]).strftime('%Y-%m-%d %H:%M')
    return '{}'.format(date)

Am I doing some sort of no no by having two inputs? if so, what would be the logical fix for this?

Thanks in advance,
Troy

1 Like

I face difficulty with categories type column.

range

When I convert to json, an example of jasonied row:

{"columns":["SEB5ASA_SEB_range","SEB5ASA_SEB","Donor_ID"],"index":[0,1,6,10,12,13,15,16,18,20,22,24],"data":[[{"closed":"left","closed_right":false,"length":25,"open_left":false,"right":25},9.611,"P15-002"]

When it is read, by using pd.read_json a value of “SEB5ASA_SEB_range” column looks like:

{‘closed’: ‘left’, ‘closed_right’: False, ‘length’: 25, ‘open_left’: False, ‘right’: 25}

The original format was [0, 25)

Hello,
I’m new to dash.I’m trying to upload a file and do some analysis using that file in callback functions. In one callback function I’ve uploaded that file and returned the json format of that file to one particular id (i.e intermediate-value). But when I try to access the same json data in another callback,it is throwing an error (TypeError: the JSON object must be str, bytes or bytearray, not NoneType). Please provide me some help. Here is my code.
I’m getting error in my last callback function

import dash
import dash_core_components as dcc
import dash_html_components as html
import base64
import json

import io

import plotly.graph_objs as go


import pandas as pd
import numpy as np

import time

from dash.dependencies import Input, Output
import dash_table as dt





external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

colors = {
    "graphBackground": "#F5F5F5",
    "background": "#ffffff",
    "text": "#000000"
}

app.layout = html.Div([
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True
    ),
    html.Div(id='intermediate-value',style={'display':'none'}),
    
    html.Div(id='output-data-upload'),
    html.Div(id='conf_matrix')
  

])


def parse_data(contents, filename):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'csv' in filename:
            # Assume that the user uploaded a CSV or TXT file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))
        elif 'txt' or 'tsv' in filename:
            # Assume that the user uploaded an txt or tsv file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8')), delimiter = r'\s+')
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return df



@app.callback(Output('intermediate-value','children'),
             
            [
                Input('upload-data', 'contents'),
                Input('upload-data', 'filename')
            ])
def update_table(contents, filename):
    #table = html.Div()
    #print("flag 0")
    if contents:
        print("flag 0")
        contents = contents[0]
        filename = filename[0]
        df = parse_data(contents, filename)
        df=pd.DataFrame(df)
        
        df =df.convert_dtypes()
        return df.to_json(orient='split')
        


@app.callback(Output('output-data-upload','children'),
             
            [
                Input('upload-data', 'contents'),
                Input('upload-data', 'filename')
            ])
def update_table(contents, filename):
    #table = html.Div()
    #print("flag 0")
    if contents:
        print("flag 0")
        contents = contents[0]
        filename = filename[0]
        df = parse_data(contents, filename)
        df=pd.DataFrame(df)
        
        df =df.convert_dtypes()
        

        def chkd(col):
            if col.dtype== 'float64' or col.dtype == 'Int64':
                return 'Continuous'
            else:
                return 'Categorical'
            
        df_tab=pd.DataFrame(df.describe(include='all')).stack().unstack(0)
        
        try:
            df_tab['cnt_null'] = df.apply(lambda col: col.isnull().sum())
            df_tab['null_%'] = df_tab.apply(lambda row: round((row['cnt_null']/(row['cnt_null']+row['count']))*100,2), axis=1)
            df_tab['cnt_zeros'] = df.isin([0]).sum(axis=0)
            df_tab['Data_Type'] = df.dtypes
            df_tab['variance']=df.var(axis=0)
            df_tab['mode'] = df.mode().iloc[0,:]
            df_tab['1%'] = df.quantile(0.01)
            df_tab['5%'] = df.quantile(0.05)
            df_tab['10%'] = df.quantile(0.1)
            df_tab['90%'] = df.quantile(0.9)
            df_tab['95%'] = df.quantile(0.95)
            df_tab['99%'] = df.quantile(0.99)
            df_tab['skew']=df.skew(axis=0)
            df_tab['kurtosis']=df.kurt(axis=0)
            df_tab['Variable_Type'] = df.apply(lambda col: 'Categorical' if col.dtype=='object' else chkd(col))
        
        
        except TypeError:
            pass
        
        df_tab=df_tab.reset_index(level=0)
        
        df_tab.rename(columns={'50%':'median'},inplace=True)
        df_tab.rename(columns={'count':'cnt_non_null'},inplace=True)
        df_tab.rename(columns={'index':'Variable'},inplace=True) 
        
        df_tab=df_tab[['Variable','Variable_Type','Data_Type','cnt_non_null','cnt_null','null_%','cnt_zeros','mean','std','mode','min','1%','5%','10%','25%','median','75%','90%','95%','99%','max','variance','skew','kurtosis']]

        
        #cols=[{'name': i, 'id': i} for i in df_tab.columns]
        df_tab = df_tab.sort_values(by ='Variable_Type',ascending=False)
        print("wait for 3 sec")
         
        time.sleep(3)
        
        df_tab.to_csv('univariate.csv')
        
        new_df = pd.read_csv('univariate.csv')
   
       
        table = html.Div([
            html.H5(filename),
            dt.DataTable(
               data=new_df.to_dict('records'),
               columns=[{'name': i, 'id': i} for i in df_tab.columns]
            ),
            ])        
                
        return table
    

#
@app.callback(Output('conf_matrix', 'children'), [Input('intermediate-value', 'children')])
def update_graph(jsonified_cleaned_data):

    # more generally, this line would be
    # json.loads(jsonified_cleaned_data)
    dff =json.loads(jsonified_cleaned_data)
    return str(dff)
    




if __name__ == '__main__':
    app.run_server(debug=True)

thank you for the explanation. I got a problem with when i tried to implement this methode:
dash.exceptions.SameInputOutputException: Same output and input: intermediate_value.children
what’s wrong please

May I ask what’s the difference of using this approach vs dcc.Store?

2 Likes

Does this work for multi-page/tab apps? I’d like to share data between callbacks that are in different files for separate tabs.

What is create_table??
I feel like this is what I need. I have a pd.DataFrame with all the right values corresponding to the callback, but it isn’t showing up in the dashboard because the returned value having type DataFrame is not JSON serializable. Then when trying .to_json() still nothing shows up.
Any ideas?

Hi @chriddyp

when you create the variable global_df = pd.read_csv(’…’)
are you initializing the global variable to first read an empty string or does this pd.read_csv(’…’) actually take in an initial variable with data and create the df? I am not sure where/at what point the data gets into this global variable and what data is being used to create the df.