Class object storing dataframe with custom methods

jbabek · September 19, 2019, 1:50am

Hi,

I have been using Plotly and Dash for about a year now and it is a great program. I am still a relatively new developer, although have years of experience with analytics.

I do have a question concerning data. In the past, I have imported dataframes on startup, and used filters to display the data I want. I now have a process that is rather complex. It doesn’t require a lot of data to load, but has a lot of little program steps to mold the data for display. I have created an object that includes the main dataframe, and some pre-defined methods for filtering, projections, and other data manipulation.

Running methods on the object in the callbacks has worked locally. I also have tried initializing a new object in each callback and a hidden div to store a filter element. Both seem to run fine on my local machine.

That being sad, my question is which is the proper way to do this? The object is too complex to store in a hidden div. I have concerns that even if there is only one object, it will behave differently if a new thread is started during one of the callbacks. Would it be possible/advisable to store the object as a pickle object?

Thanks in advance,
Jon

alexcjohnson · September 19, 2019, 3:46am

The key is ensuring that your callbacks don’t mutate any observable properties of global variables. The two main scenarios, to figure out if this object mechanism you’ve created is safe:

What happens if two people are using your app simultaneously, firing callbacks to the same server? Will each one see the app acting the same as if the other were not there? Imagine the first user gets halfway through their analysis when the second user opens the app and starts firing callbacks - will the second user have all the same options available as the first? Will the first be able to continue, unaware of the second user?
What happens if subsequent callbacks from a single user are handled by different server processes? This could happen either because you started multiple worker processes (that do not communicate with each other), or because the original server process was halted and restarted in the middle of the user’s analysis.

What this means in practice is that each callback needs to be given (as arguments) all the information it needs in order to reconstruct the filtering/projection/manipulation state. Where there’s room for optimization is in caching the results of these operations, as described in https://dash.plot.ly/performance. The easy way to do this is within one server process, using something like @functools32.lru_cache. But that isn’t shared across server processes. Better is to use a cross-process cache like Flask-Caching; this can, if done carefully, also allow you to just use a uuid or a hash as cache key, which can be a big help if the info to recreate that state is very large.

jbabek · September 19, 2019, 8:43am

Thanks for the response…it is very helpful.

With this app, there is very low risk of it being used by two people at once. It consists of two dashboards. The first dashboard takes an excel file by drag-and-drop, formats the headers, and stores the data in a local sqlite3 database.

The second dash reads the data as an entire table from sqlite3. This is brought into a class that takes the table as a dataframe, and has a series of methods that creates a number of attributes, including other dataframes that are all stored in the object. The filters are changed via a dropdown control, and a few tables and one graph are changed. Right now, I call the object and apply the filter in each callback. I have one button that will export the data on the screen to an excel workbook.

At the end of a session, all data is lost (other than if the user saves an excel file). The information is not expected or desired to be persistent outside of the context of a single user running a single session.

In addition, it is all run inside a docker container, which would be single use.

The reason for using the sqlite database, is in time this will pull directly from another database, and I wanted to have the workflow in place. The first dash will become redundant and be discarded.

So, in a long-winded way, thank you for the advice. I will continue with the version that puts the object in each call back with its own set of filters.

Topic		Replies	Views
How does one modify a dataframe with a callback? [former: circular import errors] Dash Python	42	188	October 28, 2024
Storing python dictionnary of DataFrames Dash Python	7	1036	September 16, 2021
Need to store data in session, is there an alternative? Dash Python	1	5079	October 25, 2018
Issue - displaying a dataframe with filters Dash Python	4	2225	December 5, 2020
Sharing data between callbacks Dash Python	3	1175	December 22, 2020

Class object storing dataframe with custom methods

Related topics