I have a multi-page Dash app, however my focus is just on one of the child apps. I’m looking for the best way to fetch data from an API once every hour and make it available to all users on the server who visit one specific child app. Ideally each user would not have to execute the data fetch routine each time they hit the app for the first time in a session.
If I use a dcc.Interval along with dcc.Store within the app, will the scope of that stored data be unique to each user (browser), or will it be accessible to all users on the server?
If you store the data in a dcc.Store it will only be available for the user to whom the store belongs as the store data is saved in the user’s browser.
When a callback activates, the browser will collect all the required info, as you defined it in the callback inputs and states, and send that info to the server. The server will then execute the function associated with the callback, given the input data that the browser provided.
Whatever the callback function returns is send back to the browser, and the browser updates the components defined in the callback outputs with the new data.
This is the stateless design of Dash and no data is stored on the server.
If you want to store data in a way that it is available to all users, you need to use a storage place that your app can access. This can be a database or S3 bucket (or similar). You can then create a function that will retrieve the data from the storage location and when the timestamp on the data is older than 1 hour make an api call to update the data and store it back in the central location.
Depending on how many workers your server consists of, you could also try a global variable in your Python code. As long as the server remains live, this can work. Note that if you have multiple workers, each worker will get its own Python instance, and this approach will not work, as setting the variable in one instance will not be reflected in other instances.
My workflow involves making an API call and getting back a small amount of data. I’d like this data to be stored in cache for all users for 2 hours. The app will read the data from cache and make a couple of charts.
It’s not immediately obvious to me how to achieve that with Background Callback Caching since I want to store the data itself, not an element in the layout.
I think you will not be able to circumvent external infrastructure, because you will need to store that information somewhere. Of course, this depends on your server infrastructure. If you have a server that has multiple workers, each worker will have its own associated disk space.
Workers usually cannot access each other’s disk space, so there is no natural way of sharing information between the workers. The only option here is using an external storage space. This is also said on the background callback caching help page:
In all container-based deployment environments (including Dash Enterprise and Heroku), the filesystem is ephemeral, meaning it is only associated with the container. The cache in an ephemeral filesystem is not persisted across deploys or restarts and isn’t shared between multiple replicas of the container. These are a few of the reasons why DiskCache is not suitable for production.
I have no experience with using both options you presented, so cannot comment very much on either of them. But in the end it will all be the same: find some external place where you store the info, have the callback fetch from there first and only when the info is old enough, fetch from the API again.
Perhaps as an in-between alternative (depending on how many workers your server has): You could try the global variable approach, i.e. you will store the data in memory of each instance. When a callback needs the cache info, it will check the state of the global variable in the instance in which it is running. You will get an API call per worker per 2 hours, so if the number of workers is limited, the number of API calls is limited per 2 hours (but most likely more than 1 time, the optimal).
This approach doesn’t require any additional infrastructure, at the cost of a little bit more API calls, but still less that it would be if the API had to be called every time. Plus, this has the benefit that you access the information from memory and you don’t have the delay for accessing the external data store.