I’m trying to make a plot that can handle big amounts of data. For the following example, I based myself on the datashader case study but instead of plotting an image I update the points being shown based on the datashader mapping.
The issue is that when I use Scattergl the points turn invisible, they are still hoverable. Try chucking the code below in a notebook cell and see for yourself.
import plotly.offline as py # 3.7.1
import plotly.graph_objs as go
import datashader as ds # 0.6.9
import numpy as np
import pandas as pd
def ds_image_to_data(x_range, y_range, plot_width, plot_height):
if x_range is None or y_range is None or plot_width is None or plot_height is None:
return None
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=plot_height, plot_width=plot_width)
agg_scatter = cvs.points(df, 'x', 'y', ds.any())
# get a boolean pixel mapping with index x and columns y
agg_scatter = agg_scatter.to_pandas().transpose()
# get a dataframe with columns x, y and boolean for pixel state
agg_scatter = agg_scatter.stack().reset_index()
# get only values with pixel set to True
agg_scatter = agg_scatter.loc[agg_scatter[agg_scatter.columns[2]]]
print(f'Plotting {len(agg_scatter)} points')
return agg_scatter['x'], agg_scatter['y']
def update_layout(layout, x_range, y_range, plot_width, plot_height):
# Update with batch_update so all updates happen simultaneously
with fig.batch_update():
fig.layout.xaxis.range = (x_range[0], x_range[-1])
fig.layout.yaxis.range = (y_range[0], y_range[-1])
fig.data[0].x, fig.data[0].y = ds_image_to_data(x_range, y_range, plot_width, plot_height)
size = 20000
df = pd.DataFrame({'x': np.arange(0, size),
'y': np.sin(np.arange(0, size))})
x_range=[df.x.min(), df.x.max()]
y_range=[df.y.min(), df.y.max()]
plot_height=400
plot_width=800
trace = go.Scattergl(
x = df['x'],
y = df['y'],
mode = 'markers',
)
layout = {'width': plot_width, 'height': plot_height,
'xaxis': {'range': x_range},
'yaxis': {'range': y_range}
}
fig = go.FigureWidget(data=[trace], layout=layout)
fig.layout.on_change(update_layout, 'xaxis.range', 'yaxis.range', 'width', 'height')
fig
If we change size to 5000 the plot works fine and you can even see the points being recomputed. If we keep 20000 points and change to Scatter (remove the WebGL) the plot works but it’s extremely slow. This was the minimum reproducible example I was able to create. I know that 20000 work fine with just using WebGL without datashader but the amount of data I’m trying to plot is much bigger than this .
Here’s a gif showcasing what is happening to me (I didn’t share the non WebGL cause it was really slow and took too long for a gif)
20k points with ScatterGL
5k points with ScatterGL
I’ve got some other questions related with this which are:
- Is there any way I can disable the on_change callback or add some delay between the update calls? In my implementation, if the first update code is not finished and I start doing more zooms the plot breaks.
- Is there any callback that enables me to change what the Reset Axes or autoscale button do?