How to re-use big dataframe during callback?

roudan · August 1, 2023, 4:33pm

Hi

During the callback, I need to read a big dataframe and visulizing a heatmap on dcc.Graph(). Since the dataframe is the same during consecutive callback, so I don’t want to read the dataframe each time callback. So how to store this big dataframe when the first time callback?

The option I am using so far is to use a global variable to save this big dataframe, which can be repeatedly used during callback. However, using global variable has a issue and it will cause out of memory error sometimes. so I am wondering if there are some other more memory efficient ways to store the repeated used dataframe after first call.

Thanks.

AIMPED · August 1, 2023, 6:30pm

Hi @roudan you could use dash-extensions:

More info:

roudan · August 1, 2023, 6:52pm

Thank you so much AIMPED. I will take a look at it to see how it works. Thanks, I appreciate it.

roudan · August 1, 2023, 8:02pm

Hi AIMPED, for Jupyter Dash, how do I do for below? Thanks

app = DashProxy(transforms=[ServersideOutputTransform()])

The one I have is:

app = JupyterDash(name,
external_stylesheets=[
“https://stackpath.bootstrapcdn.com/bootswatch/4.5.0/flatly/bootstrap.min.css”
],
suppress_callback_exceptions=True
)

AIMPED · August 1, 2023, 8:13pm

I think, dash-extensions are not available for jupyter, are they @Emil ?

Emil · August 1, 2023, 8:14pm

No, you are correct - I haven’t made an implementation for Jupyter

adamschroeder · August 2, 2023, 3:42pm

hey everyone,
just a heads up that Dash 2.11 and later supports running Dash apps in classic Jupyter Notebooks and in JupyterLab without the need to update the code or use the additional JupyterDash library.

stpnvKirill · August 2, 2023, 4:32pm

You can try caching your dataframe using flask_caching, for example like this:

from flask_caching import Cache

# set up your Cache instance
cache = Cache(app.server,config={'CACHE_TYPE': 'SimpleCache'})

# creating a function that gets a DataFrame
@cache.cached(timeout=3600*24)
def get_df():
    return pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')

# use in callbacks
@app.callback(
    Output(...),
    Input(...)
)
def create_graph(value):
    df = get_df()
    ...

This will allow the data frame to be loaded only once during the set timeout.

Also, I recommend you to use optimizations at the pandas level, for example

def get_df():
    # load only used columns
    df =  pd.read_csv('titanic.csv', usecols=['survived', 'age', 'class', 'who', 'alone'])

    # Convert to the optimal data format
    df.age = df.age.fillna(0)
    df = df.astype(
        {
            'survived': 'int8',
            'age': 'int8',
            'class': 'category',
            'who': 'category',
            'alone': 'bool'
        }
    )  
    return df

Topic		Replies	Views
Upload DataFrame and store the data for use in Callback Dash Python	6	793	May 30, 2023
Sharing the same dataframe accross multi page app Dash Python	6	1562	November 28, 2022
Refreshing main dataframe when page is reloaded Dash Python	4	1069	July 20, 2020
I have a problem with processing a large dataframe with dash (live updating heatmap) Dash Python	3	750	November 12, 2021
Shared variables between callbacks - DjangoDash Dash Python question	2	27	December 12, 2024

How to re-use big dataframe during callback?

Related topics