Stability issues

We have deployed DASH at https://speakerbench.com and we experience stability issues in the form that 1) web pages do not fully load, 2) sometimes web pages will not load at all. The ISP claims that they cannot see any issues on the server side, i.e. lots of CPU and memory available.
For example, the third app (appc.py) would at some point not load at all, even if I refreshed the browser many times. I eventually gave up.
What could possibly create these problems? Please feel free to propose.
It doesn’t help to restart the client (web browser or even the entire computer).

The app has been code-optimized and has minimal number of global variables.
I found this link, which I am not sure this is related to DASH Plotly, yet some of the tips might be relevant:

… Much of what is proposed in the above how-to is client related.
I look forward to hearing your proposal(s) as to what could be wrong on the server side - maybe something to be aware of in the code?

Best regards,
Claus

Hello @cfuttrup,

How many users does it take to initially start seeing these problems? On one of your pages, I noticed that it called _dash-update-component several times on the site (5 times for Speakerbench). Depending on how your load balancer is set up, this may cause some issues because of the multiple calls to the server in order to load a single page.

Do you have several backend servers running DASH, are you using gunicorn or uvicorn to create several workers to handle multiple simultaneous calls?

Hi jinnyzor

Thank you for your input and sorry for the waiting time.
I was able to ZOOM chat with my partner in Speakerbench today and we understand maybe a bit more of what is going on, or at least feel we have been pointed in a direction where we can start digging.

Generally speaking you’re focusing on the server being (too) busy, for one reason or another, which makes sense.
It looks like the web server is using Gunicorn as a WSGI, but I don’t know how it is setup (number of workers) and will ask my ISP.
Thanks for the tip to look into how we load dash-update-components. We’ll look into that.

Our server is setup with
Entry processes : 1/20 (supposedly only 5% loaded when I looked)
Number of Processes : 1/80 (also ‘chilling’)

I don’t know the status when I experienced trouble, because I was at work and I didn’t have my home computer with me (and no access to the server side).

With your input I will investigate a bit more. Thank you.

With kind regards,
Claus

By the way, we looked into stderr.log on the server and found at the tail end (last 15 lines):

[UID:1324][3694872] Reached max children process limit: 6, extra: 2, current: 8, busy: 8, please increase LSAPI_CHILDREN.
[UID:1324][3694872] Reached max children process limit: 6, extra: 2, current: 8, busy: 8, please increase LSAPI_CHILDREN.
[UID:1324][3694872] Reached max children process limit: 6, extra: 2, current: 8, busy: 8, please increase LSAPI_CHILDREN.
[UID:1324][3694872] Reached max children process limit: 6, extra: 2, current: 8, busy: 8, please increase LSAPI_CHILDREN.
[UID:1324][3694872] Reached max children process limit: 6, extra: 2, current: 8, busy: 8, please increase LSAPI_CHILDREN.
[UID:1324][32015] Child process with pid: 46714 was killed by signal: 15, core dump: 0
[UID:1324][32015] Child process with pid: 46722 was killed by signal: 15, core dump: 0
[UID:1324][830272] Child process with pid: 855853 was killed by signal: 15, core dump: 0
[UID:1324][830272] Child process with pid: 855851 was killed by signal: 15, core dump: 0
[UID:1324][830272] Child process with pid: 855867 was killed by signal: 15, core dump: 0
[UID:1324][830272] Child process with pid: 855852 was killed by signal: 15, core dump: 0
[UID:1324][1182474] Child process with pid: 1193501 was killed by signal: 15, core dump: 0
[UID:1324][1182474] Child process with pid: 1193500 was killed by signal: 15, core dump: 0
[UID:1324][1182474] Child process with pid: 1193499 was killed by signal: 15, core dump: 0
[UID:1324][1182474] Child process with pid: 1193505 was killed by signal: 15, core dump: 0

Does this provide any tips as to what could be our problem?

Best regards,
Claus

It does seem interesting.

For litespeed, it seems that killing the process is normal:

The LSAPI process looks to be the issue. Only 6 people can be waiting for a response from the server? I’m not sure because I’ve not used LiteSpeed.

Hi all and jinnyzor

To whom might be interested. I finally had the web hotel look at things, they adjusted a few parameters
(unknown to me what it is) and shortly after annouced the end of the support for Python apps at the hotel.
I was unable to find another web hotel with this feature, so we instead opted for a Virtual Private Server,
with Gunicorn and Apache running. Migration finished early January.

Best regards,
Claus

I use gunicorn with nginx myself. :slight_smile:

Glad you got it working, and stinks that they did that to you.