Celery integration?

Great question @lucapinello , I was actually just thinking about this a little bit yesterday.

There are a few things to consider here:

  • When running dash apps locally using python app.py, Dash is running on a single python process. It can only handle 1 request (callback) at a time.
  • This can be modified by adding processes=4 to the app.run_server argument, e.g. app.run_server(debug=True, processes=4). This will run the app with 4 processes which will 4 callbacks to be executed at the same time.
  • In production, we recommend using gunicorn to handle more concurrent requests. The -w flag sets the number of workers:
    gunicorn -w 4 app:server
    
    If it’s CPU bound, then you can set the number of workers to be the number of CPU cores:
    import psutil
    psutil.cpu_count()
    
    or, dynamically
    gunicorn app:server -w $(python -c 'import psutil; print(psutil.cpu_count())')
    
    If your computations are thread-safe then you also add a number of threads for each worker with the --threads argument. I don’t have much production experience with this, but playing around with a test server, I ran
    gunicorn --threads $(python -c "import psutil; print(4*psutil.cpu_count())") -w $(python -c "import psutil; print(psutil.cpu_count())") server:app
    
    on my 4-core macbook air and was able to process 64 cpu blocking requests in parallel (4 threads * 4 cores * 4 processes)
  • So, if your long-running task is used to update the UI of your app, then I think that you are better of just running the task as part of the Dash + gunicorn processes. I don’t think that Celery will help here because it will just move the computation from a dash worker python process to a celery python process on the same machine. I may be mistaken, but I believe that you would be better off allocating all of your available CPU resources as gunicorn workers directly on your Dash app.
  • If the long-running task can be performed asynchronously, then you could queue up the tasks with something like celery. If there is only a single process, then celery will execute the tasks serially instead of in parallel (doing them as part of dash processes will execute them in parallel). We use celery on our own systems on plot.ly for e.g. sending email asynchronously. The nice thing about celery is that it can queue up the tasks. If all of the dash workers are occupied running long tasks, then it will be impossible to add more tasks. With celery, you can just add them to the queue.
  • If you move tasks to the celery queue, then your app can check the status of the tasks every few seconds with the dcc.Interval component. Here’s a sort of related example: How to integrate dash with another process that might be blocking.
  • If this long-running task is triggered from a single input component and is used to update several output components, then changing that input component will fire of several long-running tasks, blocking several python processes. In this case, you can use one of the methods outlined in plot.ly/dash/sharing-data-between-callbacks for running the task in a single callback and then sharing the resulting data to the rest of the callbacks.
  • If the long-running task has a small set of parameters, then you can use the last method outlined in plot.ly/dash/sharing-data-between-callbacks for caching the result so that all future viewers don’t have to wait.
  • Finally, if your long-running tasks are IO at all IO bound with (e.g. with file reading, sql queries, network requests), then gunicorn‘s gevent “worker class” will automagically “monkey-patch” a ton of the python standard library with asynchronous versions: urllib, time.sleep, sockets, and more. (It took me a while to wrap my head around how this monkey patching works. Here’s an example from their source code that cleared things up for me: https://github.com/gevent/gevent/blob/13d860ae84a5aa5dec384d9fd1d67c2a642c9686/src/gevent/monkey.py#L251-L255). You can run apps with gevent with:
    gunicorn app:server -w 4 -k gevent
    
    By “asynchronous” I mean that when the IO bound task is running, instead of the python process just waiting around, gevent will free it up to process other tasks and then return to it when the task is done. Pretty amazing.
  • I’ve had luck using gevent for making network requests asynchronous but I haven’t tested it thoroughly with things like database queries.

Let me know if this makes sense. I’m curious to learn more about these types of bioinformatics tasks and excited to see how we can get Dash to work really well in these cases :beers:

3 Likes