Hi all,
in my app the user loads big dataframes (dynamically), and callbacks share them for different manipulations.
I use hiden html.Div s to store the dataframes, which means I to_json and from_json them each time.
these actions are costly so I tried using memoize in order for them to only happen once.
example:
the thing is that I see that the computation is being performed more than once, I see in the debugger that it is a long action, instead of just hashing a known result.
what can cause the memoize not to work as expected?
global_source_json is a serialized dataframe the user has uploaded, I save it in a shared hidden div
after the user has uploaded it, I read it multiple times for different things.
the overhead of json to dataframe each time is too heavy currently
try to avoid hidden div for large data. i was using following cache scenario, mb it can be usefull:
@cache.memoize()
def get_data(some_key_parameter):
print("**df_slice_reader_store**")
#get data from source database or file whatever
df=pd.read_sql()
return df
some_key_parameter was string from time cutted to 19:4 . time was computed on page-refresh in browser and stored in hidden div. so all users actually send request to database once per every 10 minutes (i dont need real-time) and gets df immideatly from server, not sending it to users front and get back and so on.
thanks!
can you suggest a way to avoid the usage of the hidden div?
currently the flow is:
user uploads file -> file is stored serialized to json and saved in hidden div -> every time a callback needs the df I send the serialized file as an input for the memoize func.
currently I don’t see a way to implement the row where you call: read_sql(), because if I don’t pass the file in the argument how can I access it?
thanks again