Scheduled update for data

My app uses data that I’m manually updating to the repo monthly (that’s how often new reports are added to the data). I have to pull the data from a database (takes about 10 minutes) and then process it (takes another 10 minutes). Then I save this processed data as several CSVs and a couple other file types.

How can I automate this process?

I’ve considered using celery to make a background process that runs a script that pulls data from the db and then runs another script that processes the data and outputs the CSVs and other files my app needs, but how can I then commit and push these files to the repo? It seems like celery and db connections that Dash pages talk about are all storing the files locally and not adding them to the repo.

I’ve also considered using Gitlab CI/CD pipeline to write a job that pulls data from the db and processes it but then I run into the same problem—how do I commit the outputted data to the repo?

Hello @russhopper,

Welcome to the community!

Why are you wanting to push it to a repo? That seems a little like trying to go through a middle-man, instead of using something like Redis to cache the view from the database for referencing on the app’s server

1 Like

Perhaps I don’t need to—I’m new to some of the topics I mentioned. So when you refer to using Redis to cache the view from the database what do you mean? If I did have a celery worker scheduled to run periodically and pull from the database and process the data, outputting my required files, could I save those files in the Redis database and point my app to that data?

My lack of understanding with Redis/Celery capabilities may be leading me to falsely conclude that I need to be saving the output files to my repo because that’s where my app expects such files to be.

Yes, you could query the information directly from Redis, or if you prefer, can save them as csv files on the web app server, since you are doing that already.

Your app can also query directly from the database, although, with it taking 10 minutes, you’d need background callbacks.

To schedule tasks on Linux, you can use crontab.

Great. Thanks for the insight. I’ve been looking at the background callbacks functionality as well as the connecting to a database page.
So if I did use a celery worker to pull/process the data and output the files I need, can I save them to the server directory and replace the previously used files? I didn’t know if the directory structure was the same on the server or if it gives me access to do such a thing.

That depends entirely upon where you decide to host, but in most cases, you can save to the filesystem. :slight_smile:

You should be able to trigger a background callback and have it pull the data and save to the filesystem. You can even give updates to the client via progress. (This would have to be between steps in the backend process)

Great. I appreciate the quick response and insight!

1 Like