Why does data in Scatter trace data gets converted from list to tuple

Consider this:

>>> t=go.Scatter(x=[0,1,2,3], y=[0,3,3,3], mode='markers')
>>> t.x
(0, 1, 2, 3)

This is very problematic if you want to do live graphs where data arrives over time and you want to continuously append new data to existing data in the graph.

Above design will cause O(n^2) performance issue for live graphs because for each new data point, Plotly needs to rebuild entire list. Any reason why is this done this way? Any way to avoid this so trace.x remains appendable list?

I also want to point out that Matplotlib doesn’t have this issue and won’t face this performance issue.

1 Like

Hi @sytelus,

The reason this was changed in version 3 was to support the improved validation login and to support the FigureWidget class. In each of these cases, plotly has to be aware of the state of each property and this can’t be done if the properties are presented back to the user as mutable objects.

When performance is a concern, you should fall back on constructing figures from standard dict and list instances. e.g.

>>> t = dict(type='scatter', x=[0,1,2,3], y=[0,3,3,3], mode='markers')
>>> t['x']
[0,1,2,3]

If you don’t want to pay the validation cost even once at the end, you can set the validate=False argument to the plot/iplot methods.

Hope that helps clear things up a bit.
-Jon

Thanks for your response. I’m using FigureWidget using imperative code to create live graphs in offline mode. One big issue is that whatever trace I pass on to FigureWidget, it recreates this object so FigureWidget.data[0] is no longer same as trace instance I originally passed to it. So to modify the data I must use FigureWidget.data[0] which again has rewritten entire list to tuple. So each time I add new data point in the graph, FigureWidget.data[0].x and FigureWidget.data[0].y must be recreated which is huge issue due to O(n^2) perf.

1 Like

Hi @sytelus,

Yeah, this is a current limitation of streaming data with a FigureWidget. Out of curiosity, for your use case how many points are displayed before the performance degrades significantly? Also, have you tried using a go.Scattergl trace? This won’t change the n^2 nature of the trace data construction, but it will improve the render time of the figure itself.

Towards providing better support for streaming, I think the way forward would be to provide an operation in the Python API that wraps the Plotly.js Plotly.extendTraces function.

fig.data[0].extend(x=[next_x], y=[next_y])

Underneath the x and y values are stored as lists (unless they are numpy arrays), so this could involve an efficient list.extend operation on the Python side.

Would this kind of API meet your needs?

-Jon

It might be also good idea to think about this architecture. I think it would be confusing to many users if you give an instance of class A to class B but class B does deep copy instead of simply saving the reference. It is obviously not great for performance as well. May be more pythonic approach is have everything duck-typed and not making deep copies. You can then simply add redraw() or update() method that accepts flags to indicate if trace or layout should be redrawn:

fig.redraw(update_data:bool, update_layout:bool)

fig internally shouldn’t worry about if trace data is tuple or list, as long as it is array-like. Caller can then update their array-like objects as they prefer and occasionally call redraw on fig.

Thanks for taking the time to write up the suggestion. Here are some thoughts off the top of my head:

One tradeoff with this approach is validation, it wouldn’t be possible to give the user property validation feedback when a property is mutated to contain invalid values. There would also be an inconsistency (or what I imagine would seem like an inconsistency to some users) in that the figure would auto-update on assignment but not on mutation. Finally, this redraw approach wouldn’t have as much potential to improve efficiency on the JavaScript side when appending data to existing arrays because the entire array would need to be serialized on each redraw call, rather then serializing only the new elements in an extend call.

-Jon