Celery integration?

lucapinello · October 13, 2017, 12:47pm

For long blocking processes would it be possible to use a celery task and link the callback of the task to a callback of Dash?

This will make Dash suitable also for interactive bioinformatic pipelines where often you have time consuming tasks mixed with interactive plots.

Thanks for this awesome project. I am having a lot of fun with it

chriddyp · October 14, 2017, 2:45am

Great question @lucapinello , I was actually just thinking about this a little bit yesterday.

There are a few things to consider here:

When running dash apps locally using python app.py, Dash is running on a single python process. It can only handle 1 request (callback) at a time.
This can be modified by adding processes=4 to the app.run_server argument, e.g. app.run_server(debug=True, processes=4). This will run the app with 4 processes which will 4 callbacks to be executed at the same time.
In production, we recommend using gunicorn to handle more concurrent requests. The -w flag sets the number of workers:
```
gunicorn -w 4 app:server
```
If it’s CPU bound, then you can set the number of workers to be the number of CPU cores:
```
import psutil
psutil.cpu_count()
```
or, dynamically
```
gunicorn app:server -w $(python -c 'import psutil; print(psutil.cpu_count())')
```
If your computations are thread-safe then you also add a number of threads for each worker with the --threads argument. I don’t have much production experience with this, but playing around with a test server, I ran
```
gunicorn --threads $(python -c "import psutil; print(4*psutil.cpu_count())") -w $(python -c "import psutil; print(psutil.cpu_count())") server:app
```
on my 4-core macbook air and was able to process 64 cpu blocking requests in parallel (4 threads * 4 cores * 4 processes)
So, if your long-running task is used to update the UI of your app, then I think that you are better of just running the task as part of the Dash + gunicorn processes. I don’t think that Celery will help here because it will just move the computation from a dash worker python process to a celery python process on the same machine. I may be mistaken, but I believe that you would be better off allocating all of your available CPU resources as gunicorn workers directly on your Dash app.
If the long-running task can be performed asynchronously, then you could queue up the tasks with something like celery. If there is only a single process, then celery will execute the tasks serially instead of in parallel (doing them as part of dash processes will execute them in parallel). We use celery on our own systems on plot.ly for e.g. sending email asynchronously. The nice thing about celery is that it can queue up the tasks. If all of the dash workers are occupied running long tasks, then it will be impossible to add more tasks. With celery, you can just add them to the queue.
If you move tasks to the celery queue, then your app can check the status of the tasks every few seconds with the dcc.Interval component. Here’s a sort of related example: How to integrate dash with another process that might be blocking.
If this long-running task is triggered from a single input component and is used to update several output components, then changing that input component will fire of several long-running tasks, blocking several python processes. In this case, you can use one of the methods outlined in plot.ly/dash/sharing-data-between-callbacks for running the task in a single callback and then sharing the resulting data to the rest of the callbacks.
If the long-running task has a small set of parameters, then you can use the last method outlined in plot.ly/dash/sharing-data-between-callbacks for caching the result so that all future viewers don’t have to wait.
Finally, if your long-running tasks are IO at all IO bound with (e.g. with file reading, sql queries, network requests), then gunicorn‘s gevent “worker class” will automagically “monkey-patch” a ton of the python standard library with asynchronous versions: urllib, time.sleep, sockets, and more. (It took me a while to wrap my head around how this monkey patching works. Here’s an example from their source code that cleared things up for me: https://github.com/gevent/gevent/blob/13d860ae84a5aa5dec384d9fd1d67c2a642c9686/src/gevent/monkey.py#L251-L255). You can run apps with gevent with:
```
gunicorn app:server -w 4 -k gevent
```
By “asynchronous” I mean that when the IO bound task is running, instead of the python process just waiting around, gevent will free it up to process other tasks and then return to it when the task is done. Pretty amazing.
I’ve had luck using gevent for making network requests asynchronous but I haven’t tested it thoroughly with things like database queries.

Let me know if this makes sense. I’m curious to learn more about these types of bioinformatics tasks and excited to see how we can get Dash to work really well in these cases

lucapinello · October 14, 2017, 10:35pm

Hi Chris,

Thanks for the very detailed response! Yes this totally make sense.

I am already using gunicorn with the solution you have proposed with multiple workers and I am very happy with it.

I think using celery has two advantages over the current solution:

I noticed that with the current logic the client when updating the component(s) wait for a request to reply (the POST to _dash-update-component). This may be killed if your webserver (for example ngnix) doesn’t have a timeout set to a large value( proxy_read_timeout). Sometimes you cannot change that, for example if your webserver is behind another proxy that your institution is managing (common case in academia). My understanding is that during this time that worker cannot serve another request.
I didn’t know about the async workers, very cool concept! In this case I guess I may still have the timeout problem with of the hanging request to to _dash-update-component, although the server is free to use that worker to do something else.With celery instead you can schedule the long blocking process and return immediately. In this case we can use your new loading css and logic to disable the interface and then the dcc.Interval in a callback that query the celery queue for task completion; once it is done you can update the component that may show the results and re-enable the interface. In this case there is no hanging request on the client side (since you send a period request thanks to the interval) so we can use the default timeouts of ngnix and gunicorn.
If you have many users the advantage of using celery queue is that you can serve more users than the number of gunicorn workers you have since the workers are not constantly busy with long blocking processes. I like celery since you can specify the number of tasks that can be executed simultaneously, but you can still take-in more tasks and append them to the queue. So the server will not crash if your long blocking task are taking too much memory/cpu since you know for sure how many celery task will run at a given time. The idea is to have multiple gunicorn workers and only 1 or 2 celery worker. With celery you can also send the task to another machine so you can easily scale the slow part of the app if necessary. I used this solution with a previous webapp I developed a while ago with Celery +Flask+bootstrap (http://crispresso.rocks/)

For bioinformatics task there are a lot of apps already built with Shiny for gene expression analysis, single cell, genomics etc, some examples here:

http://bioinformatics.easternct.edu/shinyGEO/
https://zhiji.shinyapps.io/TSCAN/

http://schemer.buenrostrolab.com/

Shiny is the currently the standard the facto for this kind of apps, but I personally like your project more, so I really appreciate your time to reply to all the users and keep the community so engaged. I am based in Boston (pinellolab.org), if you are visiting as some point we should take a coffee. Happy to tell you more about how we are using Dash for our research projects.

Thanks again for your help

chriddyp · February 28, 2018, 9:00pm

For future reference, here is a nice boilerplate app for dealing with “long running processes”: Boilerplate Heroku Dash App for Long Processes (show and tell)

lucapinello · April 28, 2018, 12:31am

Hey Chris I just saw this: https://github.com/plotly/dash-redis-demo

This is amazing, thanks a lot!

BTW we just deployed another app with Dash:http://stream.pinellolab.org/precomputed

We love your framework

aeweiwi · May 2, 2018, 7:21am

@lucapinello Very nice application, by the way, did you develop yourself the 2d subway map ? was it pure python + plotly?
I had to build something similar but with the addition of javascript to it!

lucapinello · May 2, 2018, 10:15pm

Thanks! Yes it is python + plotly, the source code is on github.

dingx · April 18, 2019, 1:22pm

hi, @lucapinello, very interested in the application, can you share the github link? thanks in advance.

tachyon663 · June 5, 2019, 9:49pm

Hi, this Celery reference (http://docs.celeryproject.org/en/latest/userguide/calling.html) mentions a ‘on_message’ event that can fire a standard python callback.

Is there any way to integrate this function directly with the Dash callbacks? Ideally, I’d like to have the Celery callback function update the targeted Output Div (perhaps by triggering a Dash callback in turn)?

i.e.
@dashapp.callback(Output(‘ProgressBarDiv’) … etc
def update_progressbar()

def my_celery_func():
… code to read task status and extract completion status …
call_update_progressbar()

@dashapp.callback(…etc…)
button_that_triggers_celery_task …
task_result = celery_function.apply_async()
task_result.get(on_message = my_celery_func)

I did see some examples using an Interval component, but polling seems like an inelegant solution compared with receiving callbacks directly from Celery on status change (which could be several minutes or hours for say training a CNN).

Varlor · July 9, 2019, 1:01pm

Would be awesome, looking exactly for the same!

rusiano · March 16, 2021, 9:44pm

+1, bringing this up since i think it is incredibly relevant!

allenvose · March 17, 2021, 5:08pm

Since you are wanting an update to really occur from an event on the server side perhaps tying it together with this socketio solution as a trigger and tie together?

[Triggering callback from within Python - Dash - Plotly Community Forum]

Emil · March 17, 2021, 6:11pm

You could also try out the web socket component from dash-extensions

lucapinello · March 17, 2021, 6:46pm

Hey Emil, I just want to say that your dash-extension is so cool! I discovered it a few days ago. Amazing work

Topic		Replies	Views
How to run 2 callbacks in parallel? My attempt using background_callbacks/redis/celery not working? Dash Python question	2	602	April 29, 2023
Dash Background Callbacks on Render - how to set up celery background workers Dash Python	1	1982	July 28, 2023
Creating and monitoring background tasks with Celery and Redis Dash Python tips-and-tricks	1	173	February 7, 2025
Background callback with dash extensions and flask Dash Python	5	1136	January 15, 2024
Using celery with redis Dash Python question	0	550	July 19, 2023

Celery integration?

Related topics