Dash multi-tab app - stuck on loading on some Windows OS + browser versions

Hi All,

I have a Dash multi-tab app (NOT built on the new pages based feature) that recently started hanging, when used with computers running different Windows OS versions. I confirm that it had been working fine across different computers and OS’s/browsers.

CONTEXT:

App automates analysis of various .txt files with eventual production of plots, tables and images served to the web-app. Additionally, all plots/tables/images written to disk. I use Anaconda for version control using venvs.

DEPENDENCIES:

Python 3.8.18
pandas==2.0.3
numpy==1.24.4
scikit-learn==1.3.0
scipy==1.10.1
plotly==5.14.1
dash==2.13.0
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-extensions==0.0.58
dash-uploader==0.5.0
dash-bootstrap-components==1.4.2
flask==2.2.3
kaleido==0.1.0post1

VARIOUS SYSTEM DETAILS

Used for work/home/travel working.

PC1 - Windows 10 Pro (latest updates)
FireFox (124.0.2 (64-bit)
Anaconda (2.5.2)

PC2 - Windows 11 Home Edition (latest updates)
FireFox (124.0.2 - 64-bit)
Anaconda (2.4.2)

Laptop - Windows 11 Pro (latest updates)
FireFox (124.0.2 (64-bit)
Anaconda (2.5.2)

PROBLEM:

The app only runs fully on PC2. However, it hangs part the way through a callback chain in PC1 and laptop set-ups - see attached image showing a summarised callback chain in my app.

The place where the app hangs is at callback 1 with no crashes or errors generated.

I confirm that the same dependencies are set-up on each system.

WHAT I’VE TRIED:

1- The code in callback 2 and parallel callbacks works fine, if run from a script manually. Since the app runs fully in PC2, I assume there aren’t issues with layout discrepencies i.e. missing object in layout.

2 - Inserted small piece of code that writes a dataframe to file at the end of callback 2, just before outputs are returned from the callback function. I confirm that the file writes to disk, therefore it reaches the end of the callback. Here I assume that something is stuck, such that the the next callback in the chain isn’t triggered. However, no crash or errors reported…just hangs on loading indefinitely.

3 - I inserted the same piece of file write code in the parallel callbacks that follow callback 2 and nothing happens. It seems that the parallel callbacks aren’t triggered at all.

No outputs from callback 2? Server has comms issue?

4 - I haven’t re-installed Anaconda on PC2, so that versions are the same as other systems, because I don’t want to break my only working version. See further investigation.

5 - Same issue on Chrome and Edge browsers.

6 - Tried MWE’s from Dash examples and they seem to work fine across systems.

FURTHER INVESTIGATION

The environment paths in Anaconda are different between PC2 and PC1/laptop:

PC2:

(app) C:\Users\me\Desktop\Projects\APP>where python
C:\Users\me\anaconda3\envs\APP\python.exe
C:\Users\me\anaconda3\python.exe
C:\Users\me\AppData\Local\Microsoft\WindowsApps\python.exe

PC1/laptop:

(app) C:\Users\me\AppData\Local\APP>where python
C:\Users\me\AppData\Local\anaconda3\envs\APP\python.exe
C:\Users\me\AppData\Local\anaconda3\python.exe
C:\Users\me\AppData\Local\Microsoft\WindowsAPPs\python.exe

Is this an issue? The app starts and partly works in PC1/laptop, so I assume not.

I’ve seen related issues with respect to OS’s. See one of my own where Kaleido version was the issue - https://github.com/plotly/dash/issues/2503.

Here there is a hanging at loading issue, but OP didn’t post update:
https://community.plotly.com/t/dash-app-stuck-at-loading/82309

My best guess is that there is some Windows OS/browser version combination that gives rise to this issue. Pretty stuck as currently I can only fully test/develop code on one of my machines.

Does anyone have experience with this issue?

When you say ‘no errors’ are you also looking in the browser developer tools console window?

Hi David,

yes, I had a look at the console and the errors relate to css styling options passed to the front end via style sheet. The have the following structure:

Warning: Unsupported style property border-top. Did you mean borderTop?

There are 11 of these for various styling options. I have the same errors on PC2, where the whole callback chain completes without issues. When I tried to adjust my style-sheet, as specified in the error, then the css doesn’t recognise the adjusted name of the option. I left as is, as styling is working as expected.

Additionally I also have 3 warnings, that are the same across all systems:

I will edit my question with an update, as I’ve spotted some addtional thinngs that could help diagnose what is going on.

I very much appreciate the input.

Hi All,

Clarification: I stated in my OP that app hangs at callback 1. It’s actually at callback 2, just before the parallel callbacks are triggered by completion of callback 2.

Update: so the plot thickens and is related to @davidharris question earlier.

I ran the app on PC1 (see my description in OP), to see if some additional errors crop-up. The following error arises during the “hanging” phase in callback 2:

Uncaught (in promise) DOMException: The operation was aborted. 

There is no further information in the console, like the other styling related errors.

The strange thing is that my laptop has now got past callback 2 and kicked off the parallel callbacks (no update in code/dependencies/OS/browser). 5 of 6 parallel callbacks trigger their parallel plotting callbacks leading to completion of those chains i.e. plots display on the front-end. However, 1 callback hangs, and in the browser console the same “DOMException” gets raised.

I closed the server and re-started the app, to check it again, and now it hangs as before at callback 2.

Does anyone know how this error is related to a callback?

Thanks

OK, this can’t be much more than guesswork, but it looks like your fastest client is the most reliable and the problem is intermittent so I wonder if it’s something to do with the parallelism. I don’t know whether this sort of parallelism works reliably, and generally avoid it.

So maybe first (apologies if this is obvious) - all your callbacks (except Dash-uploader) should have prevent_initial_call=True

And second, does it work reliably with no parallelism? (Say, if you disable all but one of the plot callbacks)

Hi David,

thanks for the response and suggestions. I will try to disable all but 1 callback chain as suggested.

I’ll also try adding prevent_initial_call=True flag to each callback as suggested. I don’t believe I’ve tried this yet.

Regarding parallelism, I’ve found it to be very robust. Other than the recent issues, previous trouble-shooting (to the best of my knowldge) has been un-related to compartmentalising in this way.

I use json serialization of dataframes and data store objects to pass data between callbacks. In practice, I’ve seen that if a callback chain fails (normally due to an underyling data issue), the other parts will complete.

I’m actually suspecting something else now. For some time I’ve been using the same test data-set across the different machines for any on-going development. Yesterday, I tried a much smaller data-set (~60mb across 6 files) and that reliably completes all callback chains on PC1 and laptop. My normal test data-set is ~600mb across 6 files. Since I pre-process these files in callback 2, generating smaller dataframes to pass to store objects and hence to parallel callbacks, the data is reduced heavily.

I’ve actually processed much larger data-sets without any issues in the past. Why this is now a problem, I cannot say? Has an update restricted the ultimate size of data in store objects? Is there a time-out in the server during the transfer of data to store object on the server side?

I monitored the network tab in the browser console as I was loading my normal test data (where app hangs) and the moment the “DOMException” error is reported, this also appears in the console network section in red:

Am i right in thinking that is related to the transfer of a json object to the store object?

Appreciate any thoughts you or anyone else might have.

EDIT: clicking through the various additional windows when highlighting the network error above, brings me to a “response” section where i can see that this is related to the first json object created in callback 2, which is then passed to store. The data I can see is truncated, so not sure if it’s just the first json object or the multiple jsons passed to store at this point.

UPDATE: On PC2, that completes all callbacks and displays plots, I monitored the network reports in the browser console. The transfer of the initial json objects complete at approx. 94mb. The error above is raised just 2mb short of completing the transfer of json data. I tried an intermediate size dataset of approx. 400mb and this falls off too, but now a few mb shy of completing at approx. 63mb. I think for now a time-out has kicked in on the server side, for some systems but not PC2.

Hi All,

I’ve had some success in sorting this problem.

@davidharris I tried your suggestion regarding adding prevent_initial_call=True flag to callbacks. This didn’t change the behaviour previously described.

I didn’t try disabelling parallel callbacks apart from one, as I started investigating the network communication issues.

Depending on the size of the data-set being uploaded, either the data transfer completes and all parallel callbacks initiate and complete or it truncates. What is consistent with the failed transfers is that they fall-off ~2mb short of complete transfer, suggesting that there is a hard-cap now in place beyond a specific total size. Note that data-sets with this network issue, previously completed without problems. It seems like some kind of specific instruction to “kill” the transfer, is now in effect in the development server environment.

Is there an issue with AV/Windows defender interfering on PC1 and laptop (these are work machines with distinct AV set-ups to PC2)?

What allowed me to circumvent this issue, is to run the app in production server using Waitress.

I installed Waitress through pip, and then specified the following at the end of my app script:

from waitress import serve

if __name__ == "__main__":
    # Run the app with Waitress on port 8080.
    serve(server, host='0.0.0.0', port=8080, threads=8)

Now all data-sets complete calllback chains, including much larger data-sets than the ones I was testing.

This prompted me to set different ports to the default (8050), whilst running in the development server. The same issue with hanging data transfers again occurs…this seems a specific issue on this server set-up.

The major issue going forward is de-bugging during on-going development of app. As I understand it, running on Waitress doesn’t having de-bugging by default. I believe I can force logging with some additional python code, but ideally I’d like to develop as intended.

I’m going to keep the issue open for now. Perhaps there is enough info here for Dash developers to suggest a way to debug further the problems seen on the development server?

Happy to hear any additional thoughts from anyone who finds themselves reading through my long posts.

UPDATE: switching off windows defender firewall does not stop the aborted data transfer.