Black Lives Matter. Please consider donating to Black Girls Code today.
Dash HoloViews is now available! Check out the docs.

Performance enhancement using memoize

Hi all,
in my app the user loads big dataframes (dynamically), and callbacks share them for different manipulations.
I use hiden html.Div s to store the dataframes, which means I to_json and from_json them each time.
these actions are costly so I tried using memoize in order for them to only happen once.

    'CACHE_TYPE': 'filesystem',
    'CACHE_DIR': 'cache-directory',
cache = Cache()
cache.init_app(app.server, config=CACHE_CONFIG)

def df_slice_reader_store(global_source_json):
    return pd.read_json(global_source_json, orient='split')

the thing is that I see that the computation is being performed more than once, I see in the debugger that it is a long action, instead of just hashing a known result.

what can cause the memoize not to work as expected?


I’m still stuck would appreciate help

what is global_source_json? are you sure that you call function with same argument df_slice_reader_store(global_source_json) ?

global_source_json is a serialized dataframe the user has uploaded, I save it in a shared hidden div
after the user has uploaded it, I read it multiple times for different things.
the overhead of json to dataframe each time is too heavy currently

try to avoid hidden div for large data. i was using following cache scenario, mb it can be usefull:

def get_data(some_key_parameter):
    #get data from source  database or file whatever 
    return df

some_key_parameter was string from time cutted to 19:4 . time was computed on page-refresh in browser and stored in hidden div. so all users actually send request to database once per every 10 minutes (i dont need real-time) and gets df immideatly from server, not sending it to users front and get back and so on.

can you suggest a way to avoid the usage of the hidden div?
currently the flow is:
user uploads file -> file is stored serialized to json and saved in hidden div -> every time a callback needs the df I send the serialized file as an input for the memoize func.

currently I don’t see a way to implement the row where you call: read_sql(), because if I don’t pass the file in the argument how can I access it?
thanks again