Preventing Wasteful Parallel Callbacks When Running Multiple Processes and Caching

sjtrny · February 3, 2019, 9:51am

In my app multiple callbacks depend on a shared result. Calculating this shared result once is slow enough, but having multiple processes (or threads) repeatedly doing the calculation is even slower.

I think I am looking for a cache-like solution that when the cache returns a miss, the requesting callbacks will simply wait until the first callback is finished and then grab the same result. I have tried using Flask-Caching but in my tests it seems like if there is a cache miss then all workers/threads will still execute the memoized function.

At the end of this post. I have some code to demonstrate the problem. Notice that “Calculating” gets printed at least twice to the console (and perhaps more times if you are using more than one worker).

I have a solution in place using Python’s thread synchronisation tools (locks etc) however this only works for a single worker/process. What can I do to fix this when using multiple workers/processes?

import dash
import dash_html_components as html
from flask_caching import Cache
import time

app = dash.Dash(__name__)

server = app.server

cache = Cache(app.server, config={
    'CACHE_TYPE': 'filesystem',
    'CACHE_DIR': 'cache-directory'
})

app.layout = html.Div(children=[
    html.Button('Submit', id='button'),
    html.Div(id='output-container-button1', children=[]),
    html.Div(id='output-container-button2', children=[]),
])

# This should only ever be called once!
@cache.memoize()
def slow_function(argument_1):
    print("Calculating")
    time.sleep(3)
    return 1

@app.callback(
    dash.dependencies.Output('output-container-button1', 'children'),
    [dash.dependencies.Input('button', 'n_clicks')])
def update_output1(n_clicks):
    value = slow_function(3)
    return 'The button has been clicked {} times'.format(value)

@app.callback(
    dash.dependencies.Output('output-container-button2', 'children'),
    [dash.dependencies.Input('button', 'n_clicks')])
def update_output2(n_clicks):
    value = slow_function(3)
    return 'The button has been clicked {} times'.format(value)

if __name__ == '__main__':
    app.run_server(debug=True)

sjtrny · February 4, 2019, 3:20am

I was able to solve this using a combination of:

multiprocessing.lock() (instead of threading.lock())
gunicorn preloading

If you use the preload option with gunicorn then you can share objects between processes using the multiprocessing family of proxy objects.

sjtrny · February 8, 2019, 4:18am

I’ve spun my code off into a module, which is available via pip https://github.com/sjtrny/jitcache

I hope that others get some use out of this. I have an example using Dash here https://jitcache.readthedocs.io/en/latest/dash.html, which I have copied below:

REDACTED: CODE OUTDATED. Refer to the following post for updated code https://community.plotly.com/t/preventing-wasteful-parallel-callbacks/18956/5?u=sjtrny

chriddyp · February 8, 2019, 2:06pm

Nice! Thanks for sharing!

sjtrny · February 10, 2019, 12:47am

Since the other day I have changed the design of jitcache to be more in line with LRU Cache and Flask-Caching by using a decorator instead.

from jitcache import Cache

cache = Cache()

@cache.memoize
def slow_fn(input_1, input_2, input_3=10):
    return input_1 * input_2 * input_3

print(slow_fn(10, 2))

For plot.ly you can either decorate entire callbacks (just like in Dash’s Performance Docs) or you can decorate a subroutine. Below I demonstrate how to decorate callbacks (you can find more documentation here):

import dash
import dash_html_components as html
from jitcache import Cache
import dash_core_components as dcc

cache = Cache()

app = dash.Dash(__name__)

server = app.server
app.layout = html.Div(
    children=[
        html.Div(id="output-container-dropdown1", children=[]),
        html.Div(id="output-container-dropdown2", children=[]),
        dcc.Dropdown(
            options=[
                {"label": "New York City", "value": "NYC"},
                {"label": "Montréal", "value": "MTL"},
                {"label": "San Francisco", "value": "SF"},
            ],
            value="MTL",
            id="dropdown",
        ),
    ]
)

@app.callback(
    dash.dependencies.Output("output-container-dropdown1", "children"),
    [dash.dependencies.Input("dropdown", "value")],
)
@cache.memoize
def update_output1(input_dropdown):
    print("run1")

    return input_dropdown

@app.callback(
    dash.dependencies.Output("output-container-dropdown2", "children"),
    [dash.dependencies.Input("dropdown", "value")],
)
@cache.memoize
def update_output2(input_dropdown):
    print("run2")

    return input_dropdown

if __name__ == "__main__":
    app.run_server(debug=True)

chriddyp · February 12, 2019, 7:02pm

This looks really great. Once we investigate this a little bit more, we’ll consider adding it to the docs. Thanks for sharing this and keep the thread updated with your progress!

russellthehippo · September 26, 2019, 8:58pm

I like jitcache, I’ve tried it and it works well. Cool build.

If you want an option that does this explicitly and feels like using a dictionary, I made https://github.com/russellromney/brain-plasma for this very purpose. Also available through pip.

from brain_plasma import Brain

brain = Brain()

@app.callback(...)
def create_slow_df_only_once(...):
    df = # large, slow, data
    brain['slow_df'] = df # saves to a Plasma in-memory object store
    ...

@app.callback(...)
def access_large_df(...):
    df = brain['slow_df']
    ...

chriddyp · September 28, 2020, 9:16pm

@sjtrny - I’m curious on how your @cache.memoize function works with .lock and concurrent requests.
Let’s imagine that a callback def update takes 5 seconds to run and two requests are made, the second request 2 seconds after the first one.

What happens? Does the memoize decorator put some kind of lock on the update function and does the second request “wait” for the first request to finish and then access the value from memory? Therefore the second request gets the value after 3 seconds instead of recomputing the same result in 5 seconds?

sjtrny · September 28, 2020, 11:44pm

Yes that’s correct - the second request will only take 3 seconds.

chriddyp · September 29, 2020, 6:23pm

Very nice. @russellthehippo - Does your plasma integration work the same way or does the end user need to program in the “waits”?

russellthehippo · September 29, 2020, 9:16pm

@chriddyp sadly brain-plasma has no locking control for the brain for the same object…which I suppose means brain-plasma doesn’t solve the key problem of doing the same thing multiple times in separate callbacks.

Some behavior notes for brain-plasma:

If the same client is trying to access an object that is being created, the request will fail if the object create transaction is not done (because the object doesn’t officially exist yet).
A request to any other object works fine, either with the same client or a different client
A request by a separate client will just return an object not found error until the object is created
If the object already exists, it is immutable. This means that, while an object is being changed, the previous value will be returned until the old value is deleted (ending the copy-replace-delete update transaction).

I like this implementation https://github.com/sjtrny/jitcache/blob/master/jitcache.py#L32
I may implement something similar for same-client get/put calls in brain-plasma. There’s no way to protect against it between clients though.

Bw984 · March 15, 2021, 7:57pm

Hi Stephen, jitcache seems to solve a pretty big Dash issue that many of us deal with. My question is how does jitcache handle timeouts or does it reach a maximum quantity of cache keys? I don’t see a default timeout or max key configuration and I’m worried I’d run up an infinite cache size in production.

sjtrny · March 15, 2021, 11:58pm

At the moment it doesn’t. Happy to take contributions/pull requests to add that feature though.

Potentially the code could be changed to just become a thin wrapper of an existing cache that support such features.

Bw984 · March 16, 2021, 3:56pm

I may try to take a look at it in the coming days/weeks. If I can make something work I’ll submit a pull. I’m running multiple dash apps from a larger flask application and everything was harmonious until increasing the gunicorn worker count. I had to move user sessions to a redis backend to preserve state/auth status and I’m now seeing problems with flask-caching filesystem cache as well. I believe your locking pattern would allow the filesystem cache to function properly with multiple gunicorn workers but it’s a hunch at this time. I’ll experiment with jtcache on the dev version of my site to see if it solves the multiple workers issue and report back.

Bw984 · March 16, 2021, 10:24pm

Is there a way to prevent the following error when running the examples from the jitcache docs? I’m getting the same error on Windows and MacOS and Google seems to be lite on this particular issue. I’m taking the dash example code directly from the docs with a copy/paste to get this error.

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

sjtrny · March 17, 2021, 3:06am

Looks like the issue is the default start method on MacOS changed from fork to spawn in Python 3.8

https://docs.python.org/3/whatsnew/3.8.html#multiprocessing

Setting the start method back to fork is a quick work around e.g.

from jitcache import Cache
import time
import multiprocessing as mp

mp.set_start_method('fork')

cache = Cache()

@cache.memoize
def slow_fn(input_1, input_2, input_3=10):
    print("Slow Function Called")
    time.sleep(1)
    return input_1 * input_2 * input_3

print(slow_fn(10, 2))

Stival · April 13, 2021, 1:58am

Hi All,

I was struggling with ‘multiple entrants’ problem and found that redlock-py may help in some scenarios.

In my case, using dash Cache and Signal pattern, when running multiple workers (gunicorn), mutiple threads were entering the “expensive function that loads the cache” …

I am using Redlock to enforce that only the first entrant carry on with the expensive computation, and that the others wait for the result df to be placed in cache. Obs: using flask_caching and redlock with Redis backend.

Just a sketch:


def slow_function_only_once():
   
   df = cache.get('data_key') 
   if (df is not None):
           return df # lucky
  
   my_lock = cache.set('lock_key', **expiration_millisecs**)
   #if my_lock is False, someone else already has the lock is is doing the hard work (*almost sure,* see below)
   
    while(my_lock == False):
          df = cache.get('data_key')
          if (df is not None):
               return df
          else:
               if (max_retries_exceded): 
                     my_lock = dlm.lock('lock_key', experiation_millisecs)

     #ok, if we got here... we have the lock, lets do the work
     df = expenseive computation

     cache.set('data_key', df)
     dlm.unlock('lock_key') #release the lock now that df is available for other threads
     return df

Important. It is not bullet proof. Please check out this post for details on when redlock might not be fail proof…

Key points:

in my scenario, the goal of the function is to compute and store a large df into cache; so if by reasons below some thread manage to snick in, it would not break the application
expiration_millisecs is important to make sure the lock does not get locked forever in case a worker fails after acquiring the lock (just curious, does jitcache ensure that a failed worker wont left the lock locked?)
it is key to chose expiration_millisecs wisely. It should give enough time for the thread that hold the lock to complete, place data into cache, and then release.
3.1 if the lock expires beforehand, some other thread may start computing instead of just wait for the value (again, would not break the application necessarily… just waste resources)

Well, it is not perfect. But good enough for some scenario, right? What do you guys think about this solution? Is there something I am missing?

Final note: Redlock uses random keys and check-and-set operations which improve robustness… but one could even mimic this pattern using redis set-get to control the lock, etc… (worth it reading the article

Masya · March 28, 2022, 7:10pm

@sjtrny Hi. I have the same error on windows 10/64

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
            if __name__ == '__main__':
                freeze_support()
                ...
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The suggestions above didn’t work for me.
Сould you tell me how to run your code? or perhaps there are other ways to solve the problem (if the callback is executed, do not run the second one and do not create a queue)

Topic		Replies	Views
Correct way to share multiple variables across callbacks Dash Python	0	1778	December 5, 2018
Background/Memoized callback questions Dash Python question	0	386	September 25, 2022
Example 3 - Caching and Signalling from Sharing Data Between Callbacks page does not work as written Dash Python	1	437	September 21, 2021
@app.callback improvements? (Trigger component, same Output multiple times, callback without Output, serverside store / callback cache) Dash Python	24	4524	November 9, 2020
Efficiency: Many small callbacks/few inputs and outputs, few callbacks/many inputs and outputs Dash Python	4	618	March 8, 2019

Preventing Wasteful Parallel Callbacks When Running Multiple Processes and Caching

Related topics