How to automate data uploading process for flat files dropped daily for dash app

Hi Dash Community,

I am trying to design a process where user drop a file at the same location everyday and dash app will be able to picked up there automatically when user refresh in the browser.

So far i tried store component but my file is large that it exceed the quota of my browser.

Now i am thinking in the direction of if there is anyway i can make my app do a restart every day at certain time so it can pick up the latest file. I am not sure how to do that either.

if there is any other way can accomplish the same function, i am open to hear it as well.

Thanks in Advance!!

Peter

You could make a dcc.Interval - firing a good deal faster than once a day - that provides most of the app layout, and a dcc.Store that saves some unique id of the active file (its mod time, or hash, or filename, whatever is unique for your use case). Something like:

app_data = read_current_file()

def update_data():
    global app_data
    # get_current_file_id should be quick like looking at name or mod time
    latest_file_id = get_current_file_id()

    if latest_file_id != app_data.file_id:
        # app_data is a global so it can be used by other callbacks
        # but for cross-process consistency, any callback that uses
        # app_data should also call update_data
        app_data = read_current_file()
    return latest_file_id

@app.callback(
    [Output('main', 'children'), Output('file-id', 'value')],
    [Input('refresh', 'n_intervals')],
    [State('file-id', 'value')]
)
def update_page(n, file_id):
    latest_file_id = update_data()

    if file_id == latest_file_id:
        raise PreventUpdate

    return create_main_layout(), latest_file_id

If reading the data is quick enough that you can leave it in the file and just read in what you need in any callback that needs it - or if it’s only create_main_layout that needs the data - that simplifies things. You wouldn’t need update_data and the global var, just get_current_file_id

Hey Alex,

Thanks for the reply.
I had my own update function set up like yours to monitor if there is new files dropped but when i do my update/refresh data source, how do i make sure all my callbacks is also updated with the refreshed data since passing in stored data is not an option for me.

Second, Can you please elaborate more on what does function create_main_layout() do?

I was imagining you’d have an app layout that looks something like:

app.layout = html.Div([
    dcc.Interval(id='refresh', ...),
    dcc.Store(id='file-id', ...),
    html.Div(id='main')
])

create_main_layout would then provide the children of main - basically your entire page aside from the interval and store - using app_data.

Using this pattern, any callback that needs app_data should also call update_data() at the beginning to ensure the global var is up to date. Even though the interval callback will have updated the data in the process it was called in, other requests may go to other processes so could still have the old data. This is why it’s important to make get_current_file_id fast: almost all update_data calls will only need to call that, not the likely slow read_current_file.