Hello, I am working on a project which consists of creating a dashboard for managing purchasing activities, this dashboard must be automated so I have already extracted the data and transformed the data into using python and I created the dash application for data visualization, and now I just have to automate it so that each time the excels files are modified the dashboard must be updated. We do the extraction manually and we save the data in the form of excel files, we do this work because we cannot link our code directly with the data source for this we aim at least to link the code with a folder that contains the excel files and each time we have a change the code is triggered, I don’t know if this is possible with python and dash?
Hi @jowairya !
Welcome on the Forum!
The easiest way I think, is to use the Interval component, see:
https://dash.plotly.com/dash-core-components/interval
You can use it to trigger a callback at constant interval (set in milliseconds) you set and use this callback to fetch your new excel file.
If you update your excel file once a day for instance, you can set interval = 1000*60*60*24
thank you very much @Skiks for your response!
It sounds like your intent is to do an update of the data for all users at particular times (i.e. when new excel file(s) are available)? If this is the case, I would suggest two possible approaches,
-
You could make the layout a function and load the data from within this function (or load the data in a callback). This approach will ensure that your data are always up to date when the user loads the app (or refreshes the page). The downside of this approach is that data are loaded every time, which may impact performance. Depending on the size of the data, this could be a huge problem, or not noticeable at all
-
You could load the data on app init and bind it to a global variable thus “burning” the data into the app. This approach will yield a better performance for larger datasets, as the data is only read once. However, as the data are “part of the app”, you’ll need to deploy a new version of the app, whenever the data changes. Depending on your CI/CD setup, this might be a big issue or no issue at all
The Interval
component is intended for use cases where you need to request an update from a particular client. I don’t believe that fits your needs; unless you want to ensure that data is updated for any connected client when the data changes without them refreshing the page. If this is a requirement, you could use option (1) with an Interval
component that invokes a callback that checks for new data and loads them if present.
I will follow up on Emil’s reply. I solved very similiar problem where my data was too big (it loaded for half an hour) to use first option. So I wrapped my whole data loading and preprocessing pipeline into a function that creates dataframe which is saved into global variable. Then I created BackgroundScheduler from apscheduler.schedulers.background which fires once a day after the data is ready and rewrites the global variables with data.
Of course one have to be really cautious with this approach. Changing global variables will be always kinda risky approach. You have to be careful to use references on your global variable and not copies. It will also depend how you use the data in layout creation - it can happen that the data changes when the user is browsing the dashboard.
Yes I want that the data is updated for any connected client and also as I said my data is stored in excel files and these files are in a folder in my laptop so I want that every time these excels are changed, the data is updated in the app, this is the first time I work on a real project I don’t know how to deal with this issue. Thank you very much for your response!
Hello @martin2097, thank you for your response, I will try what you have suggested, I hope I can understand it quickly even though I still a beginner !!
In that case, I would recommend that you start with option 1, possible with caching to improve performance for subsequent requests. Here is a small example,
import os
from functools import lru_cache
from dash import Dash, html
file_path = "example.txt" # name of file that contains the data
@lru_cache(maxsize=1)
def get_latest_data(last_modified):
"""
Place your data parsing code in this function. The purpose of including the "last_modified" argument is to only
trigger a re-load of the data, when the data has actually been modified, i.e. when "last_modified" changes.
"""
with open(file_path, 'r') as f:
return f.read()
def layout():
last_modified = os.path.getmtime(file_path) # use modified timestamp to check if new data is available
data = get_latest_data(last_modified)
return html.Div(f"Latest data is [{data}]")
app = Dash()
app.layout = layout
if __name__ == '__main__':
app.run_server()
I would generally not recommend the approach of updating global variables, especially not for production workloads. It’s officially discouraged and may introduce hard-to-debug issues down the line.
Thanks a Lot @Emil !
I’m working on a project that involves creating a dynamic dashboard for managing purchasing activities. The dashboard relies on data extracted and transformed using Python, and I’ve already built a Dash application for visualizing the data. Now, my goal is to automate the process so that the dashboard updates automatically whenever there are modifications to the Excel files.