How to design an application that gets data from API in a way that the fetched data is shared among all users of the app

Brakjen · July 18, 2023, 11:17am

Hi all!

I am a relatively new dash user, coming here from streamlit with which I found it very difficult to design complex layouts. I am really liking dash so far, and I am trying to wrap my head around all of the possibilities and best practices out there. I am looking for a bit of inspiration and tips on how to realize a certain project I am working on.

Background

I am designing a dashboard application that fetches live data from a web API and visualizes this in various plotly figures arranged in a grid using dash-bootstrap-components. The application will be deployed to a kubernetes cluster within our organization, and will be used by between 5-20 persons. The data is “live”, i.e. I get the last 5 minutes whenever the request is made. The users want the dashboard to update every 10-20 seconds (will be used in an operational context by a support team).

Requirements

Can get data from API regularly (every 10-20 seconds)
Has decent layout organization of the visual elements
Can handle multiple users without affecting performance too much
Can compute the difference between currently fetched data and data fetched from the previous callback
No user interaction needed. While the plotly figures are interactive, we need no fancy components to filter data or perform tasks on the data. This is a purely visual dashboard.

The backend is ready (1), and I have made the layout and all the visual elements (2). What I need help with is 3 and 4. I have a separate thread on 4..

How to handle multiple users without the app freezing?

Getting the data takes about 10 seconds. When multiple users visit the app, things start to slow down. The kubernetes deployment will use horizontal scaling of pods to allow for more users, but I would like to optimize the code a bit to make it faster. Here are some thoughts

Partial property update

Do not redraw all the figures when getting new data, but just update with the new x and y values using partial property updates.

Memoization

I have been thinking about whether I can use memoization to improve the performance, but I am not so sure. The input parameters to the get_data function are start and end. These arguments identify the time period for which to get data, and will of course be different in each call. So imaging 2 different users of the application. User 1 has just fetched the data with start = 10:43:00 and end = 10:48:00. User 2’s instance arrives at the callback with start = 10:43:20 and end = 10:48:20. The arguments are different and the callback is called. This will always take place, so the memoization is pointless.

But, what if I define the get_data callback to only get data at every whole 15 second interval? That is, whenever dt.second % 15 == 0. This would standardize the time at which to get the data for all users, assuming that their clocks show the same time (I assume modern computers in the same time zone show the same time?). In this case the input arguments to get_data will not be start and end timestamps, since these are predefined. A consequence is that a user who visits the page may need to wait 15 seconds before any data shows up, unless the callback is forced the very first time.

I could perhaps achieve this by setting up a dcc.Interval(id="interval-timer", interval=1000, n_intervals=0), and add some logic to the get_data function. Here without memoization:

app.callback(
    Output("dcc-data-store", "data"),
    Input("interval-timer", "n_intervals")
)
def get_data(n: int) -> str:
    now = pd.Timestamp.now().floor("1s")
    if now.second % 15 == 0:
        start = now - pd.Timedelta(minutes=5)
        df = backend.get_data(start=start, end=now)
        return df.to_json()
    else:
        return ''

I have other callbacks that take the dcc-data-store as input, and I can add simple logic there to either make return the fig or raise PreventUpdate. I tried out this solution, but ran into the issue that since getting the data takes so much time (around 10±2 seconds), the data would never be passed to the callbacks that render the figure. I think of it as if the “next” callback due to the increased Interval “overwrites” the ongoing callback. The end result was that no data was shown in the dashboard. If, however, I increased the interval property of the Interval object to something higher than the time it takes to get the data (say, e.g., 15 seconds), then the is visualized as expected. (Note I also need to adjust the floor when determining the now above). Also, (triggering of) the data fetch takes place at every 00, 15, 30, 45 seconds regardless of the user, so the input to the get_data function will be the same across users. Perhaps this is now set up for memoization to work?

The scenario I want is this: User 1 visits the app and the get_data callback eventually gets triggered. This user enjoys my dashboard for quite some time alone. Then a second user visits my app. The time is 10:00:05. The get_data callback is reached, and the code determines that the end argument should be 10:00:00 (flooring to the nearest 15 seconds). User 1 has already fetched data with this input, and so user 2 is served the data instantly from the memoization, saving the user to wait another 10 seconds before the callback is triggered again.

If I were to add memoization to the above example, would that allow me to enter the callbacks much more rapidly, e.g. every second, without “overwriting” the get_data callback as described above?

#" Separate backend data getting from frontend
Another question is this: What happens when multiple users are on the app simultaneously when the time for updating the data arrives. If 10 persons are on the app, will the 10 persons fetch the data at the same time? That seems a bit wasteful, since they will get exactly the same data (assuming their computer clocks show the same time). Is there a way to separate getting of the data from each user session, and instead fetch this data in the background by some sort of daemon that e.g. writes the data to file which all frontent users can access at the relevant times?

Is this something I can setup in Python/dash, or would it be more of a redesign of the whole deployment? For example deploying a separate container that always runs and gets data that is written to a file. Then another container that runs the dash web app can read these files whenever they change and update the visuals. My intuition tells me that such a solution would scale much better with the number of users, since each user will not perform costly API calls.

Conclusion

I realize this post became a bit large and rambling. I truly appreciate any pointers on best practice and ideas for how to best become proficient in dash. I see many use cases in our organization, and I am looking forward to the next one already!

Here’s a teaser image of the application

jinnyzor · July 18, 2023, 11:45am

Hello @Brakjen,

If you want to share data across users, I use a stored procedure on my sql server that performs the query and saves it to a table. Then each server is setup to query the tables into their local SQLite server, 30 seconds after the stored procedure is triggered. This is all done outside of the app.

Then upon the reloading of the data on the client side, the local server returns their tables from the SQLite server. Now, if you want all the users to be in sync, you should use something like the local time for the client.

Topic		Replies	Views
Dash - Caching Data / Handling Multiple Users Dash Python	5	4006	July 1, 2020
What's the best way to share data between users in a single app? Dash Python question	4	296	April 16, 2024
My solution for Dash app for multi-user which is updatable with page refresh Dash Python	2	3917	November 23, 2017
Live data plotting app architecture Dash Python	0	353	December 9, 2018
Improving App Performance [ Help Please ] Dash Python	10	1843	December 11, 2021