Hi I’m working on my first ‘production’ dashboard app and wanted to get opinions on how to structure the data pipeline… I have experience in data science, but I’ve almost always worked from cleaned up csv files.
I’m planning on analyzing and displaying metrics based on live sensor data paired with relatively static user/project data. The metrics will probably reach up to 50 users and contain up to 10,000 sensor -user pairs summarized into basic statistics.
At the moment I’m planning on using mysql RDS to store the raw data, and write a python script to query the database every 10 minutes to create the transforms/metrics I’d like to display and store them as csv files in a project directory. I’m using AWS, gunicorn, and nginx to serve the app currently.
Then I’d reload the csv files every 10mins in pandas, filter the data based on reactive inputs, and output them with dash.
My reasoning is that I’d think performance would be vastly improved if the data transforms were handled by another process instead of being generated on the fly with a sql query in dash.
Do you think this is a good way to handle data for a ‘live’ updating dash app? Am I missing something? Is there a better way to handle it?
Thanks. I appreciate your feedback.