Proper way to plot large datasets

Hi there,

I have been starting experimenting with Dash and I am very happy with it so far. However, I am still not sure how to perform the following:

I am using Dash for creating plots of large datasets (sensors at 500Hz running for a few hours, for instance). Selecting a shorter amount of time for the plot is not an option, since the process evolves slowly.

When loading these datasets into Dash, the rendering is unbearably slow. It seems like Dash is still displaying every single point in the dataset, and not rendering a sort of subset of the data, correct?
It would be very nice to have a rapidly rendered overview of the data over the whole duration of the dataset, and only get into details while zooming into the graph. In short: have a fixed number of points that are rendered, and the selection of the points made depending on the rangeslider. The best way that would come to my minds to do that is discarding the point at index i+1 is point at index i is close enough to it (yes, this is quite a high-level description).

I have been trying to find such a solution but it doesn’t seem like Dash is able to do that by itself, correct? (Basing this assumption on Dash Table Experiments For Large Dataset). Is this something planned in the near future? Manually filtering the data is quite a hassle since it has to be adjusted every time the user chooses a different range, and I guess this is a request that might come from many users, in these days of big data/ML/wearables…

Please let me know if I am missing any feature that already exists or a straightforward way to tackle this problem. Thanks for the good work!

4 Likes

Did you try Scattergl? It’s specifically made for large datasets.
Here are examples showing 100k and 1M points!

Full reference of this plot type is here:

3 Likes

Thank you for your answer @eliasdabbas
I did try Scattergl, but I am not fully satisfied with it for the following reason. With Scattergl, I am observing some weird traces (partially missing) in the dataset at variable zoom levels. This does not happen when using the exact same code and data but Scatter instead of Scattergl.
Example:

  • With Scatter
    15
  • With Scattergl
    51
    Other example, see red trace:
    47

I guess that is the price to pay for using WebGL, but I was wondering if there were any better solution around…

Strange… Not sure what’s causing that.
I’m actually using Scattergl on a chart that has more than 15k points, and you can keep zooming, and see all the data. It’s online and working fine.
See the bottom chart here: https://www.dashboardom.com/boxofficemojo and keep zooming. (I set the opacity at a low level).

What’s the code that is causing the data to disappear?
My hunch is that the price we pay for using Scattergl is a slight loss in resolution, but we shouldn’t lose data points, but I’m not sure about that.

1 Like

I don’t have any idea either why this happens. And it is only at various zoom levels.
My code is very basic, something like:

fig = tools.make_subplots(3, 1, shared_xaxes=True)
fig.append_trace(go.Scattergl({
      'x': df_1.index,
      'y': df1['column'].values,
      'mode': 'lines',
      'name': 'df1',
}), 1, 1)
  fig.append_trace(go.Scattergl({
      'x': df2.index,
      'y': df2['column'].values,
      'mode': 'lines',
      'name': 'df2',
  }), 2, 1)
  fig.append_trace(go.Scattergl({
      'x': df3.index,
      'y': df3['column'].values,
      'mode': 'lines',
      'name': 'df3',
  }), 3, 1)
fig['layout'].update(height=1200, title=children)
return fig

I am not taking care of the resizing of anything like this myself.
The only difference with your example is the fact that I am using lines (and not points), and that the number of points that I have is quite bigger (order of 10^6).

The code seems straightforward.

Sorry I’m not sure what is causing the loss! Would be great to know though.

Hope you can solve it. Good luck:)

1 Like

Hm, this looks like a bug. Can you share a URL of the graph? See 📣 Sharing Graphs with Chart Studio for instructions. We’ll need the full graph in order to investigate and debug.

1 Like

Thanks for jumping in @chriddyp
I have spent quite some time investigating the issue.
The main conclusion is that this is linked to the browser I am using. I am observing this issue on Chrome (MacOS Sierra 10.12.6 - Chrome Version 66.0.3359.139 (Official Build) (64-bit)) but I haven’t been able to reproduce it with Safari.
I also observed that it appears only at certain window’s width, and that a ‘Reset axis’ does not help to fix the problem. See the GIF below (resolution is not great, but enough to spot the problem; observe the green trace):
com-optimize

Is this a known problem? I can’t reproduce the issue if I use Scatter instead of Scattergl. Another interesting point to note is that displaying the plot through the Plotly Chart Studio (even with Chrome) doesn’t lead to the issue either.

2 Likes

I’ve experienced something similar to this with heatmapgl. I described the rendering problem here. I still haven’t been able to resolve it. I believe it’s a WebGL bug or fault with certain browsers. @lamourj Have you been able to fix this rendering issue?

3 Likes

HI @lamourj,

When dealing with large sequential data, this repo is an eligible solution: :arrow_down:

It performs front-end data aggregation, ensuring front-end snappiness when scaling to large datasets!

To quote your text:

It would be very nice to have a rapidly rendered overview of the data over the whole duration of the dataset, and only get into details while zooming into the graph. In short: have a fixed number of points that are rendered, and the selection of the points made depending on the rangeslider.

Plotly-resampler can exactly serve this. You can (dynamically) configure the number of shown front-end samples.

Our examples and documentation should enable wrapping your current Plotly-based graphs with this functionality.

Cheers,
Jonas

3 Likes

I have 3 dimensional binary label data and am trying to do a volume plot using plotly. However, the data is visualized as empty due to too many voxels. Does anyone have a good solution for this? Thank you