✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
⚡️ Concerned about the grid? Kyle Baranko teaches how to predicting peak loads using XGBoost. Register for the August webinar!

Bug using bar chart categoryorder and customdata

I have a Pandas DataFrame, df which I am using to populate a Plotly bar chart. For the sake of example, let’s define df as the following:

import pandas, numpy
import plotly.express as px

df = pandas.DataFrame.from_dict(
    {
        "x": ["John Cleese", "Eric Idle", "Michael Palin", "Eric Idle"],
        "y": [7, 10, 3, 8],
        "colour": ["0", "0", "0", "1"],
        "a": [1, 2, 3, 4],
        "b": [1, 4, 9, 16],
        "c": [1, 8, 27, 64]
    }
)

And create a bar chart derived from these data

fig = px.bar(df, x="x", y="y", color="colour", barmode="stack")

my_customdata = numpy.transpose(numpy.array([df["a"], df["b"], df["c"]]))

fig = fig.update_traces(
    patch={
        "customdata": my_customdata,
        "hovertemplate": "x: %{x}, y: %{y}, a: %{customdata[0]}, b: %{customdata[1]}, c: %{customdata[2]}<extra></extra>"
    },
    overwrite=True
)
fig.update_layout(
    xaxis={"categoryorder": "total ascending"}
)
fig.show()

The bug arises in the hover text for the red stacked bar. You’ll notice that the x and y data in the hover text are correct, but the data arising from the customdata are not!

Intriguingly, this error only occurs when the Pandas.Series object passed to the color argument of px.bar() consists of string data (i.e. discrete colour data). If in the code above I instead set df.colour = [0, 0, 0, 1] (using integers for continuous colour data, notice the colorbar), the following graph is created:

My project requires the use of discrete colour data, is there a workaround for this bug?

Additionally posted at https://stackoverflow.com/questions/63472524/plotly-py-bug-using-discrete-colour-data-on-stacked-bar-chart-with-customdata-in

and https://github.com/plotly/plotly.py/issues/2716

It turns out the solution was relatively simple, and was my fault rather than being an issue with the source code itself (oh, the hubris of thinking that it wasn’t my fault!)

The reason my code breaks

On running px.bar(), plotly.express creates a plotly.graph_objs.Figure object which contains two Bar objects, not one. Then, when fig.update_traces() is called, the customdata is applied to all of the Bar objects which are children of fig. Red Eric Idle is the 4th value of the original DataFrame, but fig.update_traces() doesn’t care where Red Eric Idle used to be - it only understands that now it’s part of a second Bar object. In fact, Red Eric Idle is the first datapoint of this Bar object, and so merrily assigns creates hovertext data using customdata[0][n], customdata[1][n], customdata[2][n], with n=0 (the first value), rather than with n=3 (the fourth value) as I expected.

This is why the hover text for Red Eric Idle contains "a: 1, b: 1, c: 1", rather than "a: 4, b: 16:, c: 64".

The solution

The solution is very straightforward. Since the Plotly figure “forgets” where in the original DataFrame each point’s data was supposed to go after it finishes running px.bar(), we simply have to assign custom data to each datapoint before it forgets (i.e. during the px.bar() call. Simply replace

fig = px.bar(df, x="x", y="y", color="colour", barmode="stack")
my_customdata = numpy.transpose(numpy.array([df["a"], df["b"], df["c"]]))
fig = fig.update_traces(
    patch={
        "customdata": my_customdata,
        "hovertemplate": "x: %{x}, y: %{y}, a: %{customdata[0]}, b: %{customdata[1]}, c: %{customdata[2]}<extra></extra>"
    },
    overwrite=True
)

with

fig = px.bar(
    df, x="x", y="y", color="colour", barmode="stack", custom_data=["a", "b", "c"]
)
fig = fig.update_traces(
    patch={
        "hovertemplate": "x: %{x}, y: %{y}, a: %{customdata[0]}, b: %{customdata[1]}, c: %{customdata[2]}"
    },
    overwrite=True
)

Hey presto, Red Eric Idle suddenly has hover text with the 4, 16, 64 rather than 1, 1, 1:

This is also neater code in the first place. It wasn’t obvious to me in the original Plotly example documentation that custom_data could be assigned like this within the plotly.express module.

Why this didn’t happen for continuous colour data

But why does this strange behaviour go away when we set df.colour = [0, 0, 0, 1]?

The reason is that when px.bar() is passed continuous colour data, it only creates one Bar object per figure. In this case, every datapoint in Bar remembers its row position in the DataFrame, and so the assignment of customdata works like a charm.