Black Lives Matter. Please consider donating to Black Girls Code today.

Plotly express: Performance of huge data amount

I have a question about the performance of some plotly express figures. If I use parallel coordines or densitiy heatmap with a dataframe of 100k rows and 4 columns its not possible to show the figure. Jupyter and Jupyterlab freezes. Is there any possibilty to use some method arguments for disabling some interactivity or binning some points or any other thing to make this possible?

1 Like

Hi @Varlor could you help us narrowing down the diagnosis by benchmarking on dummy data? For example the code below (corresponding to one million of rows) executes correctly on my Ubuntu laptop, on Firefox. How is it for you? What is the size limit causing a freeze of Jupyter/lab?

import plotly.express as px
import numpy as np
N = 1000000
x, y = np.random.randn(2, N)
fig = px.density_heatmap(x=x, y=y)
fig.show()

As a rule of thumb the browser is having a hard time when the data is of the order of 100 Mb (here a 1 million array corresponds to 8 Mo I believe, then it depends on the number of other arrays that the Javascript has to create in order to create the figure).

In order to downsample your data you can either slice it (x[::5]) or take random samples from you data

import plotly.express as px
import numpy as np
N = 1000000
x, y = np.random.randn(2, N)
mask = np.random.random(N) > 0.9 # keep roughtly 1/10th of data 
fig = px.density_heatmap(x=x[mask], y=y[mask])
fig.show()

or do some binning of data (x = 0.5 * (x[1:] + x[:-1])), if it makes sense to average together the data
The best method depends on the type of data you have :-).