Scheduled update for data

russhopper · January 18, 2023, 3:31pm

My app uses data that I’m manually updating to the repo monthly (that’s how often new reports are added to the data). I have to pull the data from a database (takes about 10 minutes) and then process it (takes another 10 minutes). Then I save this processed data as several CSVs and a couple other file types.

How can I automate this process?

I’ve considered using celery to make a background process that runs a script that pulls data from the db and then runs another script that processes the data and outputs the CSVs and other files my app needs, but how can I then commit and push these files to the repo? It seems like celery and db connections that Dash pages talk about are all storing the files locally and not adding them to the repo.

I’ve also considered using Gitlab CI/CD pipeline to write a job that pulls data from the db and processes it but then I run into the same problem—how do I commit the outputted data to the repo?

jinnyzor · January 18, 2023, 5:46pm

Hello @russhopper,

Welcome to the community!

Why are you wanting to push it to a repo? That seems a little like trying to go through a middle-man, instead of using something like Redis to cache the view from the database for referencing on the app’s server

russhopper · January 18, 2023, 8:05pm

Perhaps I don’t need to—I’m new to some of the topics I mentioned. So when you refer to using Redis to cache the view from the database what do you mean? If I did have a celery worker scheduled to run periodically and pull from the database and process the data, outputting my required files, could I save those files in the Redis database and point my app to that data?

My lack of understanding with Redis/Celery capabilities may be leading me to falsely conclude that I need to be saving the output files to my repo because that’s where my app expects such files to be.

jinnyzor · January 18, 2023, 8:10pm

Yes, you could query the information directly from Redis, or if you prefer, can save them as csv files on the web app server, since you are doing that already.

Your app can also query directly from the database, although, with it taking 10 minutes, you’d need background callbacks.

To schedule tasks on Linux, you can use crontab.

russhopper · January 18, 2023, 8:29pm

Great. Thanks for the insight. I’ve been looking at the background callbacks functionality as well as the connecting to a database page.
So if I did use a celery worker to pull/process the data and output the files I need, can I save them to the server directory and replace the previously used files? I didn’t know if the directory structure was the same on the server or if it gives me access to do such a thing.

jinnyzor · January 18, 2023, 8:32pm

That depends entirely upon where you decide to host, but in most cases, you can save to the filesystem.

You should be able to trigger a background callback and have it pull the data and save to the filesystem. You can even give updates to the client via progress. (This would have to be between steps in the backend process)

russhopper · January 18, 2023, 8:34pm

Great. I appreciate the quick response and insight!

Topic		Replies	Views
Understanding background callbacks via Celery/Redis better Dash Python	3	2562	October 21, 2022
Creating and monitoring background tasks with Celery and Redis Dash Python tips-and-tricks	1	150	February 7, 2025
Looking for an example to follow of deploying a dash celery/redis app to heroku? Dash Python	0	305	May 22, 2022
Celery / Redis: Background ML Tasks + Caching Results Dash Python question	0	338	May 4, 2023
Dash Background Callbacks on Render - how to set up celery background workers Dash Python	1	1950	July 28, 2023

Scheduled update for data

Related topics