Why does data in Scatter trace data gets converted from list to tuple

sytelus · February 25, 2019, 7:09am

Consider this:

>>> t=go.Scatter(x=[0,1,2,3], y=[0,3,3,3], mode='markers')
>>> t.x
(0, 1, 2, 3)

This is very problematic if you want to do live graphs where data arrives over time and you want to continuously append new data to existing data in the graph.

Above design will cause O(n^2) performance issue for live graphs because for each new data point, Plotly needs to rebuild entire list. Any reason why is this done this way? Any way to avoid this so trace.x remains appendable list?

I also want to point out that Matplotlib doesn’t have this issue and won’t face this performance issue.

jmmease · February 25, 2019, 11:34am

Hi @sytelus,

The reason this was changed in version 3 was to support the improved validation login and to support the FigureWidget class. In each of these cases, plotly has to be aware of the state of each property and this can’t be done if the properties are presented back to the user as mutable objects.

When performance is a concern, you should fall back on constructing figures from standard dict and list instances. e.g.

>>> t = dict(type='scatter', x=[0,1,2,3], y=[0,3,3,3], mode='markers')
>>> t['x']
[0,1,2,3]

If you don’t want to pay the validation cost even once at the end, you can set the validate=False argument to the plot/iplot methods.

Hope that helps clear things up a bit.
-Jon

sytelus · February 26, 2019, 2:53am

Thanks for your response. I’m using FigureWidget using imperative code to create live graphs in offline mode. One big issue is that whatever trace I pass on to FigureWidget, it recreates this object so FigureWidget.data[0] is no longer same as trace instance I originally passed to it. So to modify the data I must use FigureWidget.data[0] which again has rewritten entire list to tuple. So each time I add new data point in the graph, FigureWidget.data[0].x and FigureWidget.data[0].y must be recreated which is huge issue due to O(n^2) perf.

jmmease · February 26, 2019, 11:19am

Hi @sytelus,

Yeah, this is a current limitation of streaming data with a FigureWidget. Out of curiosity, for your use case how many points are displayed before the performance degrades significantly? Also, have you tried using a go.Scattergl trace? This won’t change the n^2 nature of the trace data construction, but it will improve the render time of the figure itself.

Towards providing better support for streaming, I think the way forward would be to provide an operation in the Python API that wraps the Plotly.js Plotly.extendTraces function.

fig.data[0].extend(x=[next_x], y=[next_y])

Underneath the x and y values are stored as lists (unless they are numpy arrays), so this could involve an efficient list.extend operation on the Python side.

Would this kind of API meet your needs?

-Jon

sytelus · February 26, 2019, 4:06pm

It might be also good idea to think about this architecture. I think it would be confusing to many users if you give an instance of class A to class B but class B does deep copy instead of simply saving the reference. It is obviously not great for performance as well. May be more pythonic approach is have everything duck-typed and not making deep copies. You can then simply add redraw() or update() method that accepts flags to indicate if trace or layout should be redrawn:

fig.redraw(update_data:bool, update_layout:bool)

fig internally shouldn’t worry about if trace data is tuple or list, as long as it is array-like. Caller can then update their array-like objects as they prefer and occasionally call redraw on fig.

jmmease · February 27, 2019, 11:14am

Thanks for taking the time to write up the suggestion. Here are some thoughts off the top of my head:

One tradeoff with this approach is validation, it wouldn’t be possible to give the user property validation feedback when a property is mutated to contain invalid values. There would also be an inconsistency (or what I imagine would seem like an inconsistency to some users) in that the figure would auto-update on assignment but not on mutation. Finally, this redraw approach wouldn’t have as much potential to improve efficiency on the JavaScript side when appending data to existing arrays because the entire array would need to be serialized on each redraw call, rather then serializing only the new elements in an extend call.

-Jon

Topic		Replies	Views
Accessing Scatter Plot attributes as list type and not tuple plotly.js	1	532	December 26, 2018
Tuple needed for a trace? 📊 Plotly Python	2	1034	January 7, 2020
Subplots :: The 'data' property is a tuple of trace instances 📊 Plotly Python	2	5328	October 27, 2018
How to update figure data? 📊 Plotly Python	2	3892	November 8, 2019
Possible to update the data for a scatter in offline mode in a python notebook?	3	9030	March 27, 2017

Why does data in Scatter trace data gets converted from list to tuple

Related topics