Scheduled data import

Hello,

I’ve a dash application which is runnin 24x7 in a docker container.

However, there are new coming every night which needs to be imported to the application.
So i need to schedule the pd.read_csv(…) every night.

Is there any way, that i can schedule the pd.read.csv(…) and keep the application itself always running?

1 Like

@Robert2,

Sure, instead of having your information and dataframe be static at time of app startup, you put your layout to use the pd.read_csv. You can have different criteria around whether or not it pulls information or not.

If this doesn’t work for you, and you want to use a schedule, then you can spin up a SQLite db and update that in a separate cron job. Then query the db as your dataframe.

Hi @jinnyzor,

ya, but our dataset is that big, as the read_csv takes almost 5 minutes to load all data into ram.
So, when we call the pd.read_csv at any time (even when no new updates are there), the applicaiton load times increase too much.

Am favorizing a “watchdog” like implementation, which keeps track of the file changes and reload only if required. During the reload, the application shall show a maintenance screen instead the dash app

Is there a reason you are using a csv file instead of a database?

There are several tests you can use to see if the file changed before parsing it.

Compare contents and using the modified date.

You could also caching the dataframe for easier initializing.

Yes, because there’s no database available at this server.

Automatic reload can be triggered by dcc.Interval

Dcc.interval requires a user to have the site open and active. Plus if you have multiple users, the dcc.intervals will not be in sync.

SQLite is a simple db that python can spin up without much effort at all.

I would update this db nightly and then query the dataframe new every new day of the user side.