Heatmap performance, layout, and related questions

orenbenkiki · August 6, 2019, 4:25pm

I’m new to Dash. I’m trying to set up an interactive system for analyzing large amount of data. I run into trouble using heatmaps.

Some of the heatmaps I want to display are large (say, ~6000 * 6000 data points). Sure, this is beyond the resolution of the (4K) screen, but it allows zooming into areas of interest. The data is available in a Pandas data frame (data type is float32).

Issues:

Low Performance

Trying to simply display the data using Heatmap(x=frame.columns, y=frame.index, z=frame.values) takes a long time. At first, it took >10 minutes, after which I gave up on waiting. I reduced this to “only” around half a minute when I specified zsmooth=False.

Does the backend try to smooth/resample the data to match the displayed size on the browser? That sounds strange, the full data still needs to be sent to the browser to allow for zooming and panning. Also, even if the code does some smoothing, it seems to be extremely inefficient. Applying a convolution to a 6000 * 6000 matrix shouldn’t take even a second on a CPU that can execute a few billions instructions per second.

I am using the built-in “debug” server, but that shouldn’t matter that much. I would expect that it would make a big difference when handling a lot of parallel requests etc., but for a single request single user scenario, using a “debug” server shouldn’t matter… right?

I tried using Heatmapgl and that didn’t solve any of the issues. It did however lose the axis labels. I never had good experience with Heatmapgl even in the offline mode so I gave up on it.

Is a 6000 * 6000 entries heatmap simply a lost cause in Dash+Plotly? If so, what is the alternative for showing large raw matrix data?

I Can’t trust what I see

Given that regardless of heatmap issues, there will always be some requests for data that may take many seconds to compute, it is impossible to tell whether the currently visible data is/n’t up-to-date. It would be extremely useful to have some sort of visible indicator informing the user whether the backend is still working.

I tried to look for a feature like this in react, it seems they have a “Suspense” component that might do the trick (and it seems Dash uses it when loading the initial page), but I’m not certain on how I can wrap it around my slow-to-compute elements using Python. Does anyone have any idea if this could be done and how, or suggest an alternative?

Broken Layout (Edit: not solved)

Before I moved to Dash, I generated offline (HTML) heatmaps. These took a few seconds to load, but once loaded, they were well-behaved and responsive. Specifically, they automatically expanded to fill the available space. For some reason, the same heatmap, when going through dash, only expands to fill the available horizontal space; the displayed height is always 450 pixels. This seems to be the fault of the JavaScript side; inspecting at the element in the browser, I see that some code is automatically recomputing an explicit width and height to the SVG element whenever I resize the window. But for some reason this computation keeps the height at 450 pixels.

I tried to specify autosize=True in the layout dictionary, and added config=(autosizable=True, responsive=True) to the dcc.Graph element, but all that achieved was making it impossibly slow to even send a 100 * 100 heatmap; the layout remained broken.

Is there some other setting that may fix this?

Edit: Searching further I discovered that for some reason the default size is 450px unless one specifies an explicit height for the graph (div). That works well if you set the height of the div (or its relevant ancestor) to something “absolute”, such as 800px or 100vh (full height). However, I don’t think it is possible to fill the remaining height after some other HTML elements too over some height. The CSS way to do this is to use height=100%, and when that is used, Dash sets the graph height to 100px (!). This seems like a bug, so I have reported it as such in https://github.com/plotly/dash/issues/857

Aspect ratio

I tried to use xaxis=dict(scaleanchor=‘y’) to force the pixels to stay square, regardless of zooming. This had no effect whatsoever. Are there additional settings required to make this work?

Thanks,

Oren Ben-Kiki

alexcjohnson · August 8, 2019, 2:10pm

whew, that’s quite a post Let’s see what we can do with it.

(1) Generally for datasets this size it’s best to downsample on the back end and update on zoom. There are various ways to handle this, check out for example https://github.com/plotly/dash-datashader. Your observations about heatmapgl are correct, it exists for legacy purposes but regular heatmap is likely better in every way at this point. And yes, zsmooth=False is best for your purposes, that option is mainly for the opposite case: smooth interpolation between widely-spaced points.

(2) Check out https://dash.plot.ly/dash-core-components/loading_component

(3) Thanks for the report! I suspect there’s a simple (if not intuitive) solution that does work right now (maybe someone else can chime in about it?) but what you observed is clearly strange, we’ll take a look.

(4) That should work… here’s a simple example: https://codepen.io/alexcjohnson/pen/gVzOdZ?editors=1010. We can dig in if you want to post some code.

Hope that helps!

orenbenkiki · August 8, 2019, 2:58pm

(1) Will try datashader. It seems to be addressing exactly my sort of problem.
(2) I don’t know how I managed to miss that one… my bad. And it even works when wrapped around a dropdown. Sweet!
(3) I guess a possible workaround would be to use Javascript… I don’t think that would count as “simple” though
(4) Not sure why it works there and not for me. But if I switch to datashader, I suspect the whole issue would be different there.

Many thanks!

Emil · August 9, 2019, 9:21am

(1) As suggested by alexcjohnson, the simplest way to address this issue would be (manual) server side downsampling. However, depending on the desired map interactivity, i am not sure that this solution would enable sufficiently fast updates during pan/zoom operations. Another option would be to use a map component, which supports client side resampling operations. I am currently working on a project that includes a dash leaflet component, which i hope to be able to open source in the near future. Hence if you could find a leaflet plugin (or even better, a react leaflet plugin) that fulfills your requirements, it should be relatively easy to write a wrapper that would make it usable in Dash.

(3) Could you provide a (broken) MWE? As i understand, you would just like to fill the parent container (please correct my if i am wrong). That can be achieved like this

import dash
import dash_core_components as dcc
import dash_html_components as html

app = dash.Dash()
app.layout = html.Div(
    # Set figure to autosize, i.e. fill parent container
    dcc.Graph(figure=dict(layout=dict(autosize=True))),  
    # Make parent container fill the whole page
    style=dict(width="100%", height="100%", top=0, left=0, position="absolute", display="grid", padding=0, margin=0)
)

if name == ‘main’:
app.run_server(debug=True, host=‘0.0.0.0’, port=5002, threaded=True)

orenbenkiki · August 11, 2019, 8:27am

Just remove the position="absolute" and you’ll see it stops working. To be fair, if you do so, the parent won’t expand to fill the whole screen (rant about CSS layout elided). Here is an example that shows the problem:

import dash
import dash_core_components as dcc
import dash_html_components as html

app = dash.Dash()
app.layout = html.Div(
    id='body',
    style={
        'display': 'flex',
        'flex-flow': 'column',
        'background-color': 'green',
        'height': '100vh',
        'width': '100%'
    },
    children=[
        html.Div(
            id='header',
            children=[
                html.H1('MWE'),
            ]
        ),
        html.Div(
            id='expand',
            style={
                'flex-grow': 1,
                'height': '100%',
                'width': '100%',
                'background-color': 'blue'
            },
            children=[
                dcc.Graph(figure=dict(layout=dict(autosize=True))),
            ]
        ),
    ]
)
app.run_server(debug=True)

I’m also adding this to the bug report.

Topic		Replies	Views
Performance issues when displaying large matrices as heatmaps 📊 Plotly Python	1	1344	May 27, 2019
Heatmap of large 2D array using datashader 📊 Plotly Python	1	693	November 27, 2020
Heatmap is slow for large data arrays 📊 Plotly Python	4	11293	October 10, 2019
I have a problem with processing a large dataframe with dash (live updating heatmap) Dash Python	3	750	November 12, 2021
Streaming data to graph without reloading Dash Python	6	911	April 26, 2020

Heatmap performance, layout, and related questions

Related topics