During the callback, I need to read a big dataframe and visulizing a heatmap on dcc.Graph(). Since the dataframe is the same during consecutive callback, so I don’t want to read the dataframe each time callback. So how to store this big dataframe when the first time callback?
The option I am using so far is to use a global variable to save this big dataframe, which can be repeatedly used during callback. However, using global variable has a issue and it will cause out of memory error sometimes. so I am wondering if there are some other more memory efficient ways to store the repeated used dataframe after first call.
hey everyone,
just a heads up that Dash 2.11 and later supports running Dash apps in classic Jupyter Notebooks and in JupyterLab without the need to update the code or use the additional JupyterDash library.
You can try caching your dataframe using flask_caching, for example like this:
from flask_caching import Cache
# set up your Cache instance
cache = Cache(app.server,config={'CACHE_TYPE': 'SimpleCache'})
# creating a function that gets a DataFrame
@cache.cached(timeout=3600*24)
def get_df():
return pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
# use in callbacks
@app.callback(
Output(...),
Input(...)
)
def create_graph(value):
df = get_df()
...
This will allow the data frame to be loaded only once during the set timeout.
Also, I recommend you to use optimizations at the pandas level, for example
def get_df():
# load only used columns
df = pd.read_csv('titanic.csv', usecols=['survived', 'age', 'class', 'who', 'alone'])
# Convert to the optimal data format
df.age = df.age.fillna(0)
df = df.astype(
{
'survived': 'int8',
'age': 'int8',
'class': 'category',
'who': 'category',
'alone': 'bool'
}
)
return df