Use Case:
I have a big Machine Learning model (1-2GB) that I want to interact with using Dash. I use Dash dropdowns and sliders to create novel data points that are fed to the ML model to make predictions. Hence, I can nicely use Dash to interactively poke my ML model and really get a feeling how different changes in features yield different predictions.
Moreover, the model should be re-trained regularly when new real-time data comes in (~every 10 minutes).
The Problem:
What is the right way to keep my big model alive in a Dash framework?
Currently, I load the model as a global variable on my Dash app startup. This is fine for a first proto-type, but probably not a good idea in the long run. Especially when I re-train the model, the global state of the entire Dash app changes and might have undesired side effects.
To allow for shared state between callbacks Dash has several suggestions:
-
Use a hidden or store div → This is really not useful here as passing around 1-2GB from front to backend doesn’t work
-
Use file or Redis cache → Almost as unpractical as the previous above suggestion, I cannot de-serialize my ML model every time a Dash component is triggered, constructing a 1-2 GB Python object (including a complex mixture of sklearn and PyTorch pipelines) from a serialized JSON/Pickle takes 30 seconds or more. Gone is the nice interaction I can have with the model.
What is the best practice to handle a big Python object such as an ML model in Dash that needs to be shared between callbacks? How to handle occasional state changes, such as re-training, of this big shared object?
Thanks!