I am working on an app which allows users to explore air emission data. So far I have the data prepared in pandas dataframes. Currently I am worried that the memory usage of my dataframes will get too big. I figured it could be an alternative to store my dataframes as csv or parquet files and have them opened in callbacks. Is this a possibility in a dash app which is used by multiple users which potentially fire lots of callbacks on the same file? I am grateful for other ideas.
I am currently developing a dash app that requires files to be read inside a callback. I found the performance to be much slower when I transferred the data in a JSON friendly format between callbacks rather than reading from a csv file. For a bump in speed I switched to Polars from Pandas dataframes due to its speed and larger than memory capabilities, it might be something worth checking out. Polars has something called a lazy frame which may be useful for what you are trying to do.
I am currently reading from IPC files since I found it to be the fastest inside the callback. They also maintain datatype in the file which helps with timeseries headaches .
Even without using the lazy API, Polars is incredibly fast compared to Pandas I have found. I have a callback that uses the AGgrid rowData to filter down a somewhat large data set and plot out that data into several interactive plotly charts and annotate them with almost no delay. Unfortunately, I cannot speak to multiple users reading and writing from the same file. If the file is read-only I do not really see a problem but I am not experience with scaling these applications. Just be aware that df.to_dict in Pandas is df.to_dicts in Polars, there are quite a few other small changes like this so use the docs as your guide.
If this ends up working for you give me a shout and we’ll go get polars tattoos together.
Polars may be fast, but until they handle this, you cannot use it for any interaction for querying from databases where you take inputs from users.
With that said, it could very well work for this case.
Thank you for directing me towards Polars. It seems to be a great alternative to Pandas.
Having read a bit about lazy load it seems like it could be an option in a multi user environment. I am however not sure how to verify if it really works with multiple users.
How are you storing your dataframes in memory in between callbacks? As long as you’re not using global variables (which you shouldn’t do, especially when you have multiple users), the memory should be lost when exiting the callback.
Or do you want to open and close the csv file several times in the same callback?
What I currently do is to create some dataframes by reading CSVs in a separate .py-file which runs once before the app is initiated. I then import those dataframes into the app in order to use them in my callbacks. In my callbacks I create a deep copy of the dfs and filter them in order to populate a e.g. a graph. This roughly looks like this:
# load_data.py
df = create_df_from_csv(….)
# app.py
from load_data Import df
….
@callback
Output(“graph“, “children“)
Input(“dropdown“, “value“)
def populate_graph(value_selected):
# create copy of df
df_temp = copy.deepcopy(df)
# Filter df_temp with selected value
# create graph with filtered df
return graph
I figured thats a way do put my data Into memory and have it available for all users. Am I wrong? I never alter the original df in my callbacks.
Now I am worried that my dataframes are using to much memory and wondered if it would be feasible to load the data from csv, parquet or some other format whenever a callback needs it instead putting everything in memory (with multiple users potentially triggering a callback accessing the same file). E.g.:
# app.py
….
@callback
Output(“graph“, “children“)
Input(“dropdown“, “value“)
def populate_graph(value_selected):
df = read_data_from_file(…)
# Filter df with selected value
# create graph with filtered df
return graph
Is this feasible? Can polars be used for this for lazily loading data from a parquet file in a callback with multiple users?