Setting multiple error bars with new plotly express 'Wide data' feature

Hello,

I’ve been enjoying the Plotly update, especially the wide-form support for plotly express. My question relates to the new functionality - if I define a plot with;

px.scatter(dataframe, x = ‘xaxis_column_name’ , y= [‘yaxis_col_1’, ‘y_axis_col_2’])

It doesn’t appear to be possible to add error bars in a similar, columnar fashion, i.e.

px.scatter(dataframe, x = ‘xaxis_column_name’ , y= [‘yaxis_col_1’, ‘y_axis_col_2’], error_y = [‘yaxis_col_1_error’, ‘y_axis_col_2_error’])

Since error_y must have the same length as the x axis.
If I do set error_y with a single column or errors, it is applied to all traces simultaneously, which is not always appropriate.

Are there any plans to add the ability to define columns of errors in the wide-form input? I think this would be very convenient. What are the suggested workarounds at present? My workflow is such that I’d prefer to call the figure first, inspect it, and then add error bars to the traces post declaration.

I understand that traces could be added one by one with .add_trace(), but I like the px API as it seems much more elegant and less verbose.

Thanks,
Rory

1 Like

Unfortunately there’s no way to do this kind of “correlated wide-form” at the moment… Your best bet would be to try do use a long-form input here.

It’s a good idea though! I’ve logged an issue here for further thought and discussion: https://github.com/plotly/plotly.py/issues/2522

Hi Nicolas! Thanks for taking a look. I’ve tried a bunch of different data transformations to get this working, but the most elegant solution I could find for now was to simply create the go.Scatter objects with the correct link between data and error bars.

I’m glad to hear you like the suggestion however. I’d love to see this feature added as it would simplify the whole process greatly. Anyway, the new plotly express wide data format has been hugely helpful, great update!

I’ll see if I can come up with a little recipe for you… Just to confirm, your data is in a Pandas data frame and has something like x, y1, error_y1, y2, error_y2, ... ?

That’s very kind. Yes, that’s exactly right

It feels like something like this would work: https://stackoverflow.com/questions/55403008/pandas-partial-melt-or-group-melt

Maybe I am missing something, but I don’t understand how the long form data format gets me closer to plotting the data. Here’s an MWE I’ve been playing with to understand your SO link:

import numpy as np
import pandas as pd
import plotly.express as px

def f(x, m, c):
    return m*x + c

x = np.arange(10)
y = f(x, 3, 1)
y2 = f(x, 5, 2)


y_err, y2_err = [np.random.random(len(x)) for _ in [y, y2]]


df=pd.DataFrame(data=[x,y,y2,y_err, y_err]).T
df.columns= ['x','y','y2','y_err','y2_err']


# this is basically the graph I want, just with the correct error bars
# setting y_error applies it to both
f1 = px.scatter(df, x='x', y=['y', 'y2'], error_y='y_err')

df_long = df.stack().reset_index().rename(columns = {'level_1': 'Variable'})

#how best to achieve the plot from here is still unclear 

Here’s how I would approach this: first you melt() just the y data, the usual way. Then you unstack the error values into a single column, which should still be well-aligned with the main dataset. Then you add the unstacked error values to the main dataset, and you plot the data in long-form, so you don’t pass any lists to y.

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(
    x=range(20),
    y1=np.random.rand(20).cumsum(),
    y_error1=np.random.rand(20),
    y2=np.random.rand(20).cumsum(),
    y_error2=np.random.rand(20)
))

print(df.head())

long_df = df.melt(id_vars="x", value_vars=["y1", "y2"], value_name="y", var_name="y_name")
long_df["y_error"] = df[["y_error1", "y_error2"]].unstack().values

print(long_df.head())

px.scatter(long_df, x="x", y="y", error_y="y_error", color="y_name")

Thanks for this solution. I’m sure this will work perfectly for my data

Appreciate you taking the time to have a look at this. I was struggling to reformat the data correctly, but this is a great example!

Cheers!