Browser Crashing When Displaying Large CSV Interactive Line Plot

Hi All,
Iā€™m trying to create a tool for displaying large, interactive line chart plots for some massive data log files. I really like the fact that Plotly is able to support zooming into segments of the data to plot it with the interactive plots, which really helps for rapid data analysis. Sharing these easily through HTML to other teammates is important to me, so Iā€™ve been using Chrome, Jupyter notebooks, and cufflinks. However, the browser seems to crash when using iplot with large CSV files, e.g. 50-150 MB, 150,000-370,000+ rows. Sometimes I see ā€œAw snap!ā€ Or ā€œOut of memoryā€. It happens with a crash in FireFox as well. After searching around, I realize there may be a limitation to the file/datasize, and there may be some better big data methods to use to visualize this, and Iā€™m wondering what the best method would be to do this?

For reference, I did see this: https://plot.ly/python/big-data-analytics-with-pandas-and-sqlite/ but I didnā€™t think I needed it since it seems the dataframe gets loaded in with no problem usually, and it even plots the chart most of the time (albeit it takes a while). However, after doing nothing for a few seconds, or when zooming, it crashes. Task managerā€™s memory usage also doesnt appear to be exceeding my systemā€™s RAM amount eitherā€¦

EDIT: the code I use to plot with cufflinks is:

cf.subplots(graph_list,shape=((len(category_dict)),1),shared_xaxes=True).iplot()

Any guidance is much appreciated.

Thanks!

Looks like youā€™re using the Python API.

Probably best to ask the folks in #api:python for help.

Hi etienne,
Thanks for the feedback. Iā€™ve moved the thread to api python.

Hi All in the Python API forum. Can you please help me with my request in the first post?
Your help is greatly appreciated.
Thanks!

Hi @RBD10100,

I expect youā€™ll have better luck using the FigureWidget class in version 3. When you set FigureWidget properties to numpy arrays in version 3, and then let the FigureWidget display itself, the arrays are passed to the front-end as binary buffers, which is much more efficient that the iplot approach, which essentially serializes the arrays into a string representation of a standard Python list.

Hereā€™s an example of a 1.2 million point scatter plot: https://github.com/jonmmease/plotly_ipywidget_notebooks/blob/master/notebooks/nyc_taxi_selection.ipynb

Unfortunately, cufflinks hasnā€™t been update to version 3 yet. So you for now you would need to create the plot using standard commands.

Hope that helps!

-Jon

1 Like

Hi Jon,
Thank you very much for the feedback and suggestion. Since most of my familiarity is with cufflinks Iā€™ll have to spend some time to try using the FigureWidget with the standard commands.

Do you by chance know where I can find out when cufflinks will be updated to v3?

Thanks!

Hi Jon @jmmease

I was able to generate my graphs with the standard commands and Iā€™m happy to say that so far, it does look like the new FigureWidget is able to much better handle all this data. I am able to plot and interact with multiple traces (24) of 370,000 data points. I got a couple crashes here and there, but itā€™s definitely leaps and bounds more usable than the previous version that choked almost immediately when doing anything.

The one thing I am trying to do now though is generate subplots with a shared x-axis for a couple of my plots. When I run the following as per plotlyā€™s site, with fig0,1,2 being FigureWidgets with scattergl items:

fig = tools.make_subplots(rows=3, cols=1, shared_xaxes=True)
fig.append_trace(fig0, 1, 1)
fig.append_trace(fig1, 2, 1)
fig.append_trace(fig2, 3, 1)

I end up with an error that I think may be related to the new FigureWidget, but Iā€™m not quite sure. Could you please take a look below and let me know?

Thanks,
Brett

ValueError: 
    Invalid element(s) received for the 'data' property of 
        Invalid elements include: [FigureWidget({
    'data': [{'name': 'Data_Core_0_0_C0Residency',
              'type': 'scattergl',
              'uid': '680c00b4-95c4-11e8-8f75-1c1b0de41d65',
              'y': array([0.01152432, 0.        , 0.        , ..., 0.        , 0.        ,
                          0.        ])},
             {'name': 'Data_Core_0_1_C0Residency',
              'type': 'scattergl',
              'uid': '683782c2-95c4-11e8-b14c-1c1b0de41d65',
              'y': array([0., 0., 0., ..., 0., 0., 0.])}],
    'layout': {}
})]

    The 'data' property is a tuple of trace instances
    that may be specified as:
      - A list or tuple of trace instances
        (e.g. [Scatter(...), Bar(...)])
      - A list or tuple of dicts of string/value properties where:
        - The 'type' property specifies the trace type
            One of: ['area', 'bar', 'box', 'candlestick', 'carpet',
                     'choropleth', 'cone', 'contour',
                     'contourcarpet', 'heatmap', 'heatmapgl',
                     'histogram', 'histogram2d',
                     'histogram2dcontour', 'mesh3d', 'ohlc',
                     'parcoords', 'pie', 'pointcloud', 'sankey',
                     'scatter', 'scatter3d', 'scattercarpet',
                     'scattergeo', 'scattergl', 'scattermapbox',
                     'scatterpolar', 'scatterpolargl',
                     'scatterternary', 'splom', 'streamtube',
                     'surface', 'table', 'violin']

        - All remaining properties are passed to the constructor of
          the specified trace type

        (e.g. [{'type': 'scatter', ...}, {'type': 'bar, ...}])

Hi @RBD10100,

The problem is that go.append_trace expects a trace object (e.g. go.Scatter(), go.Bar(), etc.), not a figure. You can get a tuple of the traces in a figure using fig.data.

Hope that helps,

-Jon

Hi @jmmease,

Thanks for the guidance. I got the subplots to work now.

While I was doing all of this, I realized I was using the plotly online mode and I eventually kept getting a ā€œPlotlyRequestError: No messageā€ error. This only started happening when I started adding more data to plot (I went from 6 lines of 370,000 data points each to 24 lines). The 6 lines of data was showing up with no issue with this method. However, once I went into offline mode (which I need for my intended purposes of not putting our data on the server), Iā€™m back to my original issue of the browser crashing again. It readily happens specifically after I start zooming into my interactive plots, which is what I started this thread to solve in the first place (but had to take a detour with to rework all my code with the new plotly version). Task manager seemed to say the crashes occurred before when I hit 4GB of RAM used, but in my latest experiment with 8 lines, it crashed with only 1.5 GB usedā€¦

My question is now: Is offline mode more susceptible to browser/memory crashes? Again, itā€™s a 150 MB CSV file iā€™m loading in here, but I dont display all the data. It seems to crash quickest when I zoom into my plots, and Iā€™ve never seen the RAM usage survive past 4GB before it crashed (my system has 16 GB).

Any advice would be great. I feel like maybe Iā€™m pushing plotly & jupyter to its limits here as well, but another interactive plot solution suggestion would be very much welcome.

Thanks,
Brett

Hi @RBD10100,

Could you describe exactly what you mean by going into offline mode. You may already be doing this, but just to clarify, to take advantage of the memory improvements in version 3 you need to construct the figure using the graph_objs.FigureWidget class, and you need to let it display itself. You should not use the offline.iplot function. This function will display the figure using the legacy rendering path.

So, if youā€™re using tools.make_subplots as your starting point, make sure you pass the figure returned there into a FigureWidget as some point (probably best to do it after youā€™ve added all of the traces you want). Something like:

fig = tools.make_subplots(rows=3, cols=1, shared_xaxes=True)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 3, 1)
...
fig_widget = go.FigureWidget(fig)
fig_widget # Not offline.iplot(fig)

That said, 24 lines with 370,000 points each is 8.8 million points, which is definitely pushing plotly and Jupyter pretty far! BTW, are you using the classic notebook of JupyterLab? Iā€™m not sure that it would matter, but it might be worth trying the other of you havenā€™t.

-Jon

Hi @jmmease,

Thank you very much! It appears that I was indeed using the old legacy rendering path in my subplot code (which happened when I was trying to get around the ā€œNo Messageā€ error and I went back to the ā€œOffline plotting with plotlyā€ documentation which I guess needs to be updated now for Plotly3 and the FigureWidget).

I always wondered that maybe I was over-stressing plotly and Jupyterā€¦ Iā€™ve been using the classic notebook up to this point (I liked the browser and HTML functions for sharing) but I havenā€™t heard of JupyterLab until now. Thanks for the suggestion! Iā€™ll give it a shot and see how it works.

Cheers!
Brett

1 Like

Thank youļ¼This widget really solve my similar problem. In my case, I used heatmap and fig.show() to display four figures, each with 1000*1200 pixels. When I show only two figures or less, all run ok. But when it came to four figures, it will take a minute to display it. Just after two minutes or so, the memory increase from 50% to almost 80% suddenly, and the browser crashed. I still dont know why this happens. I really curious about this. Finally, thanks again for this advice :smile:

Now i replace the fig.show()
with
f=go.FigureWidget(fig)
f
All works ok!