Storing objects and intermediate results

Hi there!

I started playing around with Dash. I am trying to build some kind of dashboard with respect to an app that backtests technical trading strategies.

Basically, all the backend work has already been done. I can use it with a traditional python script. When building the GUI, I stumbled across some questions with respect to storing intermediate results.

For instance, I am downloading price data. The result is a python dict that contains pd.DataFrames. Now, as far as I understand it, I could use a dcc.Store. However, this requires the serialization into JSON format at the time of writing and the de-serialization when reading the data. I am looking for a solution to simply store the dict as is.

Also, I do have an instances of an object that has a bunch of different properties, such as strings, lists, pd.DataFrames, as well as dicts containing pd.DataFrames. In production, these dataframes may be very big. As a result, this object may occupy lots of memory (talking >20GB). How to store it in a best way?

I am wondering what might be a suitable strategy to tackle these issues. Usually, I am simply storing the data/objects in python variables within my script. How am I supposed to transfer this concept to DASH?

With data in the GB size range, you can’t store the data in the browser directly. Other options could be to use server side storage, e.g. on disk or in-memory. If you intend to dump/load the whole 20 GB object in one go, disk will probably be too slow (depending on your performance requirements). A faster option would be to use an in-memory cache (e.g. Redis), but depending on the number of users you intend to accommodate, you’ll need rather beefy server infrastructure. Say for example that you intend to support 100 users; then you would need to have 2000 GB of RAM available (in a best case scenario).

Alternatively, you could look into splitting the 20 GB object into smaller pieces that can be dumped/loaded when needed. Or even storing the data in a database. What makes sense depends a lot on the application and data structure.

Currently, the software will be used by only one user. It may grow, but potentially only to a couple of users. That is, the server infrastructure needed is basically limited.

Generating the data can take quite some time, maybe minutes - depending on the amount of data to be processed. That’s why I wanted to store/cache the results (i.e. the my python object with its various properties). These properties (pd.DataFrames and dicts with pd.DataFrames) will be used as inputs for different datatables or plotly charts.