How to best include asynchronous backend data processing

In the last few months I’ve built yet another web app to track some parameters related to COVID-19.

I’m still learning how to use Dash so the architecture that I’m employing may not be the best; that’s why I wanted to have some opinion on how one should be handling data processing outside of the main dash app.

Let me explain.

When I first started the app I was just reading the dataset to be plot directly inside the main, defining some functions to plot the data where the dataset was imported and defined a dynamic layout by including


def serve_layout():
   return layout

app.layout = serve_layout

in the main body of the application.

This, of course, has a huge drawback since data needs to be loaded every time that a user refresh the page so I started employing the memoize decorator of flask_caching on the definition of the function that reads the data (still defined in the main body app). This actually worked well as only the user that first visited the page after the cache expired would need to wait a little bit more for the function that processes the data to complete. Every function called after that was using the cached results which speeded up the following refresh of the page.

Today I wanted to go even one step further by moving the processing of the data outside of the main app, so I transferred all the computation-heavy preprocessing functions into a script that I run with crontab every 2 hours. This script saves result into pickle files which are then read into the main I don’t use caching anymore as reading the pickle takes less than 40 ms, although my understanding is that this will happen every time that the application is opened, i.e. the layout is served.

The simplified tree of dependencies now looks like this

def read_data():
    return pd.read_pickle(TMP_FOLDER + 'df_data.pickle')

def filter_data():
   df = read_data()
   # filtering of the data 

    return df

# the function used to prepare data for a table in the app layout
def make_table_data():
    df = filter_data()
   # make the table

Is this a good architecture? Or can it be improved?

I’m quite happy with the loading time of the app although I think there are stills margins for improvements by doing a lazy loading of the tabs.

Just for reference here is the app and here is the source code GitHub - guidocioni/covid_webapp