Retriggering query/page-build with cached results

Please forgive the new thread. I’m sure this is not a new subject. Please feel free to point me to any relevant content.

I am constructing a Dash application that displays live data that comes in on a one-second interval to a postgres database. The query and Dash page setup can be quite expensive and unpredictable in time. Using Redis/Flask caching and Store signaling (I think), I want to:

  • Initiate the query and page-build upon app startup
  • cache the resultant page components for access by however many page client sessions may call for them
  • Keep the and serve the cache contents, while automatically restarting the query and page-build process as soon as the last one is complete
  • Refresh the cache with the new page components and signal all instances to refresh their pages from the cache

Another note is that my every attempt to have a Store update re-trigger a callback by assigning Store.data to both the output and input of the same callback has failed to re-trigger no matter how I try it.

Any pointers and code examples would be most appreciated.

Hello!

Here is how I would have done this, given your explanation.

  1. Externalize the process of refreshing data and computing layout in its own process. E.g you have a python script that runs an infinite loop that will recompute the data and (if really necessary) dash layout. Save this either in a Redis backend or in a file, and restart the loop.

  2. Use dash callbacks to update the layout every 1 second and get back the data and layout. You can use use dcc.Interval to trigger a callback every seconds. E.g trigger a callback that will retrieve the changed content (or not) from the Redis backend or file and generate the layout from it

Ask me for a code sample if it’s unclear.

I understand that processing data / doing database queries can take some time. But the layout or graphs should generate quickly, can you explain precisely what takes time?

I imagine that for step 1) you would prefer to have this integrated in the dash app. And there are solutions for, but having a separated script is the one that I think is the easiest.

Hope it helps!

2 Likes

Thank you, what you describe is what I ended up doing. The interval was easier to implement than any kind of messaging to signal a ready state, and given that the query/page build takes about 4 seconds (~3s for the query, 1s for the build), checking once a second is fine.

The page, an instrument dashboard for a data acquisition system, is quite complex, consisting of a grid of Divs each for a measurement instrument, each with a background that is a plot of the last five minutes of measurements with annotations for alarms, and a text foreground with the latest measurements.

The rabbit hole I went down was the transport of the page data between the looping query/build task and the update callback. I tried using redis on a serialized page but the page serialization was a problem. JSON choked on the page content, and when I tried to use pickle the deserialization took 2-3 seconds itself, unacceptable. I ended up punting on the serialization and redis and just pass the built page in a global protected by a mutex, flagged with a build time. It works, and makes sure that only one query is being done at a time regardless of how many clients are rendering pages.

refresh_secs = 1

# Layout for the dashboard page
def layout_dashboard(config):
    return html.Div([
        dcc.Interval(id='interval', interval=refresh_secs * 1000, n_intervals=0),
        dcc.Store(id="cache-timestamp", data=None),  # To store the last seen timestamp
        html.H1('VanDAQ Operator Dashboard', style={'text-align': 'left'}),
        html.Div(id='sample_timestamp'),
        html.Div(id='grid-container', children=['Awaiting data...'])
    ])

latest_page = None
latest_page_time = None

def update_dashboard(app, engine, config):

    lock = Lock()
    Thread(target=regenerate_page, args=(engine, config, lock), daemon=True).start()

    @app.callback(
        [
            Output('grid-container', 'children'),
            Output('sample_timestamp', 'children'),
            Output("cache-timestamp", "data"),
        ],
        [
            Input("interval", "n_intervals"),
            State("cache-timestamp", "data")
        ]
    )
    def update_page(n_intervals, last_seen_timestamp):
        global latest_page
        global latest_page_time
        if isinstance(last_seen_timestamp, str):
            last_seen_timestamp = datetime.datetime.strptime(last_seen_timestamp,'%Y-%m-%dT%H:%M:%S.%f')
        with lock:
            cached_timestamp = latest_page_time
        if cached_timestamp and ((last_seen_timestamp is None) or (cached_timestamp > last_seen_timestamp)):
            with lock:
                cached_timestamp = latest_page_time
                cached_page = copy.copy(latest_page)
            sample_timestamp = f'Last sample time (UTC): {cached_timestamp.strftime("%m/%d/%Y, %H:%M:%S")}'
            return cached_page, sample_timestamp, cached_timestamp
        raise PreventUpdate

# Periodically regenerate the page content in the background
def regenerate_page(engine, config, lock):
    while True:
        # Here is the expensive query and page-build
        items, sample_time = get_list_of_items(engine, config)
        global latest_page
        global latest_page_time
        with lock:
            latest_page = items
            latest_page_time = datetime.datetime.now()
        time.sleep(0.1)  # give some time back to the main thread

So, I think I’m good, until the use of globals bites my backside. Thanks again!

1 Like