I am a relatively new
dash user, coming here from
streamlit with which I found it very difficult to design complex layouts. I am really liking
dash so far, and I am trying to wrap my head around all of the possibilities and best practices out there. I am looking for a bit of inspiration and tips on how to realize a certain project I am working on.
I am designing a dashboard application that fetches live data from a web API and visualizes this in various
plotly figures arranged in a grid using
dash-bootstrap-components. The application will be deployed to a kubernetes cluster within our organization, and will be used by between 5-20 persons. The data is “live”, i.e. I get the last 5 minutes whenever the request is made. The users want the dashboard to update every 10-20 seconds (will be used in an operational context by a support team).
- Can get data from API regularly (every 10-20 seconds)
- Has decent layout organization of the visual elements
- Can handle multiple users without affecting performance too much
- Can compute the difference between currently fetched data and data fetched from the previous callback
- No user interaction needed. While the plotly figures are interactive, we need no fancy components to filter data or perform tasks on the data. This is a purely visual dashboard.
The backend is ready (1), and I have made the layout and all the visual elements (2). What I need help with is 3 and 4. I have a separate thread on 4..
Getting the data takes about 10 seconds. When multiple users visit the app, things start to slow down. The kubernetes deployment will use horizontal scaling of pods to allow for more users, but I would like to optimize the code a bit to make it faster. Here are some thoughts
Do not redraw all the figures when getting new data, but just update with the new x and y values using partial property updates.
I have been thinking about whether I can use memoization to improve the performance, but I am not so sure. The input parameters to the
get_data function are
end. These arguments identify the time period for which to get data, and will of course be different in each call. So imaging 2 different users of the application. User 1 has just fetched the data with
start = 10:43:00 and
end = 10:48:00. User 2’s instance arrives at the callback with
start = 10:43:20 and
end = 10:48:20. The arguments are different and the callback is called. This will always take place, so the memoization is pointless.
But, what if I define the
get_data callback to only get data at every whole 15 second interval? That is, whenever
dt.second % 15 == 0. This would standardize the time at which to get the data for all users, assuming that their clocks show the same time (I assume modern computers in the same time zone show the same time?). In this case the input arguments to
get_data will not be
end timestamps, since these are predefined. A consequence is that a user who visits the page may need to wait 15 seconds before any data shows up, unless the callback is forced the very first time.
I could perhaps achieve this by setting up a
dcc.Interval(id="interval-timer", interval=1000, n_intervals=0), and add some logic to the
get_data function. Here without memoization:
app.callback( Output("dcc-data-store", "data"), Input("interval-timer", "n_intervals") ) def get_data(n: int) -> str: now = pd.Timestamp.now().floor("1s") if now.second % 15 == 0: start = now - pd.Timedelta(minutes=5) df = backend.get_data(start=start, end=now) return df.to_json() else: return ''
I have other callbacks that take the
dcc-data-store as input, and I can add simple logic there to either make return the fig or raise
PreventUpdate. I tried out this solution, but ran into the issue that since getting the data takes so much time (around 10±2 seconds), the data would never be passed to the callbacks that render the figure. I think of it as if the “next” callback due to the increased
Interval “overwrites” the ongoing callback. The end result was that no data was shown in the dashboard. If, however, I increased the
interval property of the
Interval object to something higher than the time it takes to get the data (say, e.g., 15 seconds), then the is visualized as expected. (Note I also need to adjust the
floor when determining the
now above). Also, (triggering of) the data fetch takes place at every 00, 15, 30, 45 seconds regardless of the user, so the input to the
get_data function will be the same across users. Perhaps this is now set up for memoization to work?
The scenario I want is this: User 1 visits the app and the
get_data callback eventually gets triggered. This user enjoys my dashboard for quite some time alone. Then a second user visits my app. The time is 10:00:05. The
get_data callback is reached, and the code determines that the
end argument should be 10:00:00 (flooring to the nearest 15 seconds). User 1 has already fetched data with this input, and so user 2 is served the data instantly from the memoization, saving the user to wait another 10 seconds before the callback is triggered again.
If I were to add memoization to the above example, would that allow me to enter the callbacks much more rapidly, e.g. every second, without “overwriting” the
get_data callback as described above?
#" Separate backend data getting from frontend
Another question is this: What happens when multiple users are on the app simultaneously when the time for updating the data arrives. If 10 persons are on the app, will the 10 persons fetch the data at the same time? That seems a bit wasteful, since they will get exactly the same data (assuming their computer clocks show the same time). Is there a way to separate getting of the data from each user session, and instead fetch this data in the background by some sort of daemon that e.g. writes the data to file which all frontent users can access at the relevant times?
Is this something I can setup in Python/dash, or would it be more of a redesign of the whole deployment? For example deploying a separate container that always runs and gets data that is written to a file. Then another container that runs the dash web app can read these files whenever they change and update the visuals. My intuition tells me that such a solution would scale much better with the number of users, since each user will not perform costly API calls.
I realize this post became a bit large and rambling. I truly appreciate any pointers on best practice and ideas for how to best become proficient in dash. I see many use cases in our organization, and I am looking forward to the next one already!
Here’s a teaser image of the application