Efficient data flow between Dash callbacks with Plotly plot updates from video/player time

Hi Community,

I am experimenting with a Dash app where I
(1) import a rather large pandas data frame (let’s call it df_large) upon starting the app,
(2) use a dropdown menu to select some data filters, compute additional values (that are dependent on chronologically previous data), store the filtered df (let’s call it df_filtered) with dcc.Store (pd.df → json format) and start the video on the timestamp of the first row of data upon clicking on a button,
(3) use these stored data (imported from json format back to pd.df) together with a video player where the currentTime of the videoplayer is used to extract the matching/close enough data row via a timestamp to
(4) display the data values at a given time point, matching with the video.

In essence, I am trying to display the (roughly) synchronized video and plot values. I initially tried using the interval component, i.e., updating the graph and video every n intervals; but this did not work well in my case, since the data intervals were not always consistent, i.e., the time between data rows varied a bit. Hence, I created a workaround: I used the currentTime from video to filter the df in step (2) (df_filtered[ts] <= currentTime) and use the latest row (df_filtered.tail(1)) of these data to be plotted on the graph.

This solution works very well with smaller data sets but I would also sometimes want to be able to display the values of the entire df_large (no additional computations made, just using the available data!). The idea here is that I can click on any time on the video player, and it shows me the data values on that time (updating along with the video when the video is running).

I figured that one way would be to just use filtering with each frame update: filtering the entire data set on a time stamp coming from the video, i.e., currentTime (e.g., df_large[ts] >= currentTime - 1000ms & df_large[ts] <= currentTime + 1000ms), then taking the df.tail(1) again. However, this does not seem to work, as the updating takes a lot of time (so the plot updates often freeze).

In principle, does anyone have suggestions what else could be tried here?

If the issue is just time of retrieval from the big Pandas dataframe, then operations like those you’ve suggested can be slow on a large dataframe.

I tentatively (I’ve not tried it!) suggest that a good approach might be to extract the timestamps from the dataframe into a list and use a binary search approach to find the location of the greatest timestamp value smaller than x. The location of this in the list could then be used as an index to retrieve the data from the dataframe using e.g. .loc[…]

The bisect Python library provides support for this kind of thing, and there is example code using it here:

Thanks! I will provide an update once I try this solution.