I understand that the idea of global df is for it to not be changed. However, I want to have the option to manually refresh this dataset as new data comes in to my DB. I will not be manipulating the global df throughout my callbacks.
You won’t be able to use the return value of the callback to modify your global dataframe, but you can modify the value of a global variable inside the callback, then you can just return something sensible in the callback like the timestamp of when the dataframe was last updated. Something like this:
@app.callback(
# i'm imagining output as a hidden div, could be a dcc.Store or even user visible element
Output("last-update", "children"),
[Input("refresh-data", "value")], # if refresh-data is a button maybe you want n_clicks not value?
)
def refresh_data(value):
global globaldf
# update value of globaldf
return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
An alternative, though it depends a bit on how big the dataframe is whether it’s practical because you have to serialise / de-serialise each time you update / access it, is to keep the dataframe inside a dcc.Store. Something like this
# update data
@app.callback(
Output("store", "data"),
[Input("refresh-data", "value")],
)
def refresh_data(value):
df = ...
return df.to_dict(orient="rows") # this might be slow if the dataframe is really big
# callback that uses dataframe
@app.callback(
Output(...),
[Input(...), ...],
[State("store", "data"), ...],
)
def use_df(..., store_data, ...):
# de-serialise the dataframe
df = pd.DataFrame(store_data)
# rest of callback can now use df
This is super helpful. I feel like this is 99% of the way there, but just missing one small component. So I declare df (note that i converted this from globaldf in the code above to avoid confusion) as a global variable within the callback. Do i need to declare it a global in the initialization of the page as well? Does not seem like this is quite working yet but its definitely close.
You only need to use the global keyword inside the callback function. This is just used to tell Python that you want to assign to the variable with that name in the global scope, rather than create a variable with that name that is local to the function.
Also, it looks like you don’t have a component with id="last-update" in your layout. Just throw in a html.Div(id="last-update", style={"display": "none"}) that will accept and hide the output of your callback.
Okay awesome - I think this is working. Note that I added in an input to the update_graph function so that the graph refreshes based on the output of the refresh_data function. Did I do this correctly?
Looks ok to me. If it’s running and you can update the database then pull the updates into the app using the update button I think you can be pretty confident it’s all correct.
It’s worth noting that the limitation of this approach is that this will only update the dataframe in the memory of the worker process that is handling this request. If you have multiple worker’s they will fall out of sync. If you’re ok with being limited to a using a single worker process, then this approach will work fine, but it’s important to be aware of this restriction.