Performance enhancement using memoize

SashaFin · January 8, 2019, 10:09am

Hi all,
in my app the user loads big dataframes (dynamically), and callbacks share them for different manipulations.
I use hiden html.Div s to store the dataframes, which means I to_json and from_json them each time.
these actions are costly so I tried using memoize in order for them to only happen once.
example:

CACHE_CONFIG = {
    'CACHE_TYPE': 'filesystem',
    'CACHE_DIR': 'cache-directory',
    'CACHE_THRESHOLD': 200
}
cache = Cache()
cache.init_app(app.server, config=CACHE_CONFIG)

@cache.memoize()
def df_slice_reader_store(global_source_json):
    print("**df_slice_reader_store**")
    return pd.read_json(global_source_json, orient='split')

the thing is that I see that the computation is being performed more than once, I see in the debugger that it is a long action, instead of just hashing a known result.

what can cause the memoize not to work as expected?

thanks

SashaFin · January 14, 2019, 7:58am

I’m still stuck would appreciate help

roman · January 14, 2019, 11:52am

what is global_source_json? are you sure that you call function with same argument df_slice_reader_store(global_source_json) ?

SashaFin · January 14, 2019, 12:23pm

global_source_json is a serialized dataframe the user has uploaded, I save it in a shared hidden div
after the user has uploaded it, I read it multiple times for different things.
the overhead of json to dataframe each time is too heavy currently

roman · January 14, 2019, 4:55pm

try to avoid hidden div for large data. i was using following cache scenario, mb it can be usefull:

@cache.memoize()
def get_data(some_key_parameter):
    print("**df_slice_reader_store**")
    #get data from source  database or file whatever 
    df=pd.read_sql()
    return df

some_key_parameter was string from time cutted to 19:4 . time was computed on page-refresh in browser and stored in hidden div. so all users actually send request to database once per every 10 minutes (i dont need real-time) and gets df immideatly from server, not sending it to users front and get back and so on.

SashaFin · January 15, 2019, 7:40am

thanks!
can you suggest a way to avoid the usage of the hidden div?
currently the flow is:
user uploads file -> file is stored serialized to json and saved in hidden div -> every time a callback needs the df I send the serialized file as an input for the memoize func.

currently I don’t see a way to implement the row where you call: read_sql(), because if I don’t pass the file in the argument how can I access it?
thanks again

Topic		Replies	Views
Updating global dataframe periodically Dash Python	5	5530	August 18, 2018
Is this the correct usage of memoize? Dash Python	0	1054	January 18, 2019
Sharing data between callbacks Dash Python	3	1177	December 22, 2020
Storing python dictionnary of DataFrames Dash Python	7	1038	September 16, 2021
Shared variables between callbacks - DjangoDash Dash Python question	2	36	December 12, 2024

Performance enhancement using memoize

Related topics