I just finished going through the docs about using gunicorn, redis, celery and flower. I’ve managed to set everything up where I can see the tasks being queued in Flower. Cool. Dev environment is an M2 with 8 cores. uvloop is installed. orjson is installed. Python versions I tested with: 3.10.0, 3.11.2. Dash version is the latest one.
I have an app that runs locally, a portfolio optimizer that runs some heavy computations and returns a JSON to a dcc.Store. Some plotting callbacks then pull data from there and graph stuff out. Using tqdm on the loops inside the app, I see that for scenario X, I have 50 iterations/second. I’m using polars dataframes so I can see all cores are at 100%.
Adding server=app.server in app.py and running the server with gunicorn -w 2 app:server I can see the performance drop to a maximum of 2 iterations/second. CPUs are mostly idle. Changing the number of workers to 2*cores + 1 (17) yields the exact same results. I guess it’s important to mention here that I don’t return anything until the end of the computation.
To test this differently I created a main.py that spawns a FastAPI() server. I have a GET method there that just replicates some compute logic from the Dash app. I call that method from my dash app with a background_callback since it takes longer than 30s.
Using uvicorn main:app --log-level info --workers 17 --port 8001 for FastAPI(), the iterations are the same as before i.e. ~50 and everything runs error-free, CPUs at full blast.
Using the gunicorn production server gunicorn -w 17 -k uvicorn.workers.UvicornWorker -b '127.0.0.1:8001' main:app as suggested in their production-ready documentation (using gunicorn with Uvicorn workers), this again slows down to a max of 2 iterations per second, not to mention I get timeouts and errors like resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown.
I’ve tried all sorts of combinations from this forum and SO with workers types (including gevent), threads, and the results are the same. I’m sure I’m missing something here.
I understand from docs and forums that the dev servers are not configured to have workers and they just use everything that’s available but I don’t see a performance increase/drop by fiddling with the -w command in gunicorn.