Dash scatter lagging

Hello,
My Dash application is experiencing significant lagging when I hover over points. I am plotting a large number of points (over 100k) and I notice there is more lag when hovering over a region with high point density. This is even after switching to use scattergl instead of scatter.

I am trying to implement the suggestion found here:
Github link

You can try different values of the HOVERMINTIME constant we use to determine the delay between calls to the pick-on-hover routine. Note that this constant isn’t configurable via the API; you’ll have to build your own plotly.js bundle in order to use custom values.

How do I go about building my own plotly.js bundle? I made a copy of the constants.js file that’s located here: link
except that I changed the HOVERMINTIME to 1000.

I then tried to pass this to dash with the following approach but the hover time did not change:

external_scripts = [
‘filepath/customjava.js’
]
app = dash.Dash(name,
external_scripts=external_scripts)

How do I go about building my own plotly.js bundle?

Hi @jmillz welcome to the forum! I would not know how to change the HOVERMINTIME in the javascript, but here are a couple of thoughts more related to Dash.

  • hovering over a scattergl trace with 100k points should not be lagging. I just made a toy app and checked that the hover updates very quickly
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
import plotly.graph_objects as go

app = dash.Dash(__name__)

import json
import numpy as np
N = 100000
fig = go.Figure(go.Scattergl(x=np.random.random(N), y=np.random.random(N),
                mode='markers'))


app.layout = html.Div(
    [
        dcc.Graph(id='graph', figure=fig),
        html.Div(id='text')
    ]
)


@app.callback(
    Output('text', 'children'),
    [Input('graph', 'hoverData')],
)
def display_data(data):
    if not data:
        return dash.no_update
    return json.dumps(data)


if __name__ == '__main__':
    app.run_server(debug=True)

However, it is possible that hoverData triggers a callback which takes some time to execute, therefore the app is lagging when the callback is fired many times. Maybe one thing you can do is to impose a minimal time between consecutive executions of the callback, as

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
import plotly.graph_objects as go

app = dash.Dash(__name__)

import json
import time
import numpy as np
N = 100000
fig = go.Figure(go.Scattergl(x=np.random.random(N), y=np.random.random(N),
                mode='markers'))


app.layout = html.Div(
    [
        dcc.Graph(id='graph', figure=fig),
        html.Div(id='text'),
        dcc.Store(id='store', data=time.time())
    ]
)


@app.callback(
    [Output('text', 'children'),
     Output('store', 'data')],
    [Input('graph', 'hoverData')],
    [State('store', 'data')]
)
def display_data(data, t):
    if not data or not t or time.time() < t + 2:
        print(data, t, time.time())
        return dash.no_update, dash.no_update
    time.sleep(2)
    return json.dumps(data), time.time()


if __name__ == '__main__':
    app.run_server(debug=True)

I know this is not exactly what you’re looking for, but maybe it can still help you. If you disagree about the origin of the lag (I think it’s a callback, not the hover itself), please post more code, ideally a standalone app with dummy data so that we can test.

Thank you very much for your prompt response. I spent yesterday and today exploring your feedback. Also I need to offer a correction - the number of points that causes the lag is more like 300k - sorry I had my deployment and local development sets mixed up. Based on your example documentation for scattergl, even 300k points should not be an issue.

One thing to note is that I’m not feeding the hover data into any callback. I’m just displaying the x,y coordinates as per the default.

Here’s what I’ve found:
The scattergl plot only lags in areas where the point density is high. Our data set is similar to a long-tail distribution where there is a large range of values but the vast majority of points fall in a small range. The graph will hover just fine until your cursor approaches the point where the high density of points is.


The lag only happens on our deployment server which is a dedicated EC2 instance running on amazon. It seems that when I run Dash locally on my computer that it can handle data sets with 650k no problem while the EC2 server lags severely at 300k (to the point of freezing and being unworkable).

Interestingly, if I zoom in on the area where the hover starts to lag, I am then able to go further into the dense area before the lagging starts. So there is an interaction with not just the density of the points but also how densely they appear on the plot.

Even more interesting is that if I overwrite my data to be np.random.random and then make the same plots, there is no lagging on the EC2 instance.

So this seems to be due to an interaction between the number of points and more importantly the density of points. So unfortunately, I can’t create a standalone app/sample code to demonstrate my issue because it lies with the data I’m plotting. Is there a way that I can help you debug given how that is the situation?

The fact that the issue occurs only in regions of high point density fits well with the hypothesis of @Emmanuelle; in these regions, the callback will be triggered more often. Did you try out her solution with a custom callback?

An alternative approach would be to skip the tool tip on the original data, but instead adding it to an invisible data overlay, which is plotted in a lower resolution. The sub resolution dataset could be calculated either on a regular grid or by clustering. Choosing the clustering/gridding appropriately (possibly combined with updates on zoom), it should be more or less impossible to the client to notice the difference.

Hey @Emil - I did not incorporate the custom callback because I am not using a callback at all for the hover. I’m just using the hover to display details about the points in the text box that appears when the user’s cursor is over a point. If I did that callback, would that impact the text box?

An alternative approach would be to skip the tool tip on the original data, but instead adding it to an invisible data overlay, which is plotted in a lower resolution. The sub resolution dataset could be calculated either on a regular grid or by clustering. Choosing the clustering/gridding appropriately (possibly combined with updates on zoom), it should be more or less impossible to the client to notice the difference.

How do I make an invisible data overlay?

I just tried implementing my own suggestion with the invisible data overlay. For reference, here is the code

import json
import numpy as np

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

# Create random data.
n_data = int(2e5)
x, y = np.random.random(n_data), np.random.random(n_data)
# Shift a few data points, so that the density (on the screen) of the remaining ones increases.
x[-100:] += 100
y[-100:] += 100
# Create resampled data for tooltip. For simplicity, we just use every 100th datapoint. In most real applications, a more sophisticated algorithm must be used.
resample_ratio = 100
x_resampled, y_resampled = x[::resample_ratio], y[::resample_ratio]
# Styling, which is shared between data and tooltip overlay
default_kwargs = {'mode': 'markers', 'type': 'scattergl', 'hoverinfo': 'x+y', 'marker': {'color': 'blue'}}

def get_raw_trace(**kwargs):
    return [{**{'x': x, 'y': y}, **kwargs, **default_kwargs}] 

def make_figure(data, title):
    return {'data': data, 'layout': {'title': title, 'showlegend': False}}

def default_figure():
    return make_figure(get_raw_trace(), "Raw data")

def tooltip_figure():
    tool_tip_data = [{**{'x': x_resampled, 'y': y_resampled, 'opacity': '0', 'showlegend': False}, **default_kwargs}]
    data = get_raw_trace(hoverinfo='none') + tool_tip_data
    return make_figure(data, "Invisible tool tip overlay")    

app = dash.Dash(__name__)
app.layout = html.Div(
    [
        dcc.Graph(id='graph', figure=make_figure(get_raw_trace(), "Raw data")),
        html.Button(id='toggle', children="Toggle Tooltip")
    ]
)

@app.callback(
    Output('graph', 'figure'), 
    [Input('toggle', 'n_clicks')],
    [State('graph', 'figure')],
)
def toggle_traces(n_clicks, figure):
    if not n_clicks:
        return dash.no_update
    # Toggle between plot of raw data and data + toop tip overlay. 
    return default_figure() if n_clicks % 2 == 0 else tooltip_figure()

if __name__ == '__main__':
    app.run_server(debug=False)

However, it turns out that it does not solve the issue. After a quick look at the Plotly source code, a guess on the problem could be the loop calculating the nearest point to the cursor position*; it loops over all points within some delta (x,y) of the cursor point. If the point density is very high, this execution time of this loop could potentially be long enough to cause the delay. Is this is indeed the problem, the only fix i can think of would be to lower the delta (x,y) in the plotly source code (and then build from source) or to resample the data set, i.e. to plot a visually representative subset of the data rather than the data itself. In the latter case, you would might need to update the subset when the user pans/zooms.

I know this is an old question, but I wanted to share what I found
After reading @Emil’s post, I looked through the source code of plotly.js from the link. A little browsing showed that this delta(x, y) is decided by the figure layout’s hoverdistance value(documentation here). So, for instance, you can call

# fig is the figure being plotted
fig.update_layout(hoverdistance=1)

if you want the hover to only get points within 1 pixel of the cursor position