In the last few months I’ve built yet another web app to track some parameters related to COVID-19.
I’m still learning how to use Dash so the architecture that I’m employing may not be the best; that’s why I wanted to have some opinion on how one should be handling data processing outside of the main dash app.
Let me explain.
When I first started the app I was just reading the dataset to be plot directly inside the main
app.py, defining some functions to plot the data where the dataset was imported and defined a dynamic layout by including
read_data() def serve_layout(): filter_data() return layout app.layout = serve_layout
in the main body of the application.
This, of course, has a huge drawback since data needs to be loaded every time that a user refresh the page so I started employing the
memoize decorator of
flask_caching on the definition of the function that reads the data (still defined in the main body app). This actually worked well as only the user that first visited the page after the cache expired would need to wait a little bit more for the function that processes the data to complete. Every function called after that was using the cached results which speeded up the following refresh of the page.
Today I wanted to go even one step further by moving the processing of the data outside of the main app, so I transferred all the computation-heavy preprocessing functions into a script that I run with
crontab every 2 hours. This script saves result into
pickle files which are then read into the main
app.py. I don’t use caching anymore as reading the pickle takes less than 40 ms, although my understanding is that this will happen every time that the application is opened, i.e. the layout is served.
The simplified tree of dependencies now looks like this
def read_data(): return pd.read_pickle(TMP_FOLDER + 'df_data.pickle') def filter_data(): df = read_data() # filtering of the data return df # the function used to prepare data for a table in the app layout def make_table_data(): df = filter_data() # make the table
Is this a good architecture? Or can it be improved?
I’m quite happy with the loading time of the app although I think there are stills margins for improvements by doing a lazy loading of the tabs.