✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
⚾️ It's finally Baseball season! Root for the home team... & Register for our Sports Analytics Webinar!

Adding go.Marker to a pandas data frame

I have a Dash application where I am displaying a scatter plot using a pandas dataframe. I’m trying to set things up so that I simply feed different data frame columns to go.Scatter(). This requires adding a column to my data frame with the markers. Since I have different markers based on different conditions I have code like this:

my_data['marker'] = np.where(my_data.hasTimestamp == True, go.Marker(color=has_color, size=5, symbol='square'), np.nan)

The problem is, this is generating an error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-82-6b19cddf03fe>", line 1, in <module>
    my_data['marker'] = np.where(my_data.hasTimestamp == True, go.Marker(color=has_color, size=5, symbol='square'), np.nan)
ValueError: invalid __array_struct__

I can add other custom objects and dictionaries (instead of the markers) to the data frame using the code above without a problem, but when I add the markers in this way I get the above error. Any ideas on how to get around this?

Thanks in advance.

In version 3.0 of the plotly library we had to change go.Marker and all go objects to be immutable types on assignment. This allows us to do better jupyter integration. See https://github.com/plotly/plotly.py/blob/master/migration-guide.md#property-immutability for some more details.

So, I’d recommend just using dict if you want to reference the item after assignment. There isn’t any functional difference between dict and the go objects besides instant type validation.

Thanks. I realized that the marker property of go.Scatter only admits a single marker definition not a list like I had thought initially so this obviates the need for having a marker per row of the data frame.