I was trying to use px to create a boxplot for all the columns in a dataset (890k x 85) since df.plot(kind=âboxâ) doesnât give me interactivity.
The dataset has not been clean, so it has empty columns, ordinal, categorical columns⌠but is similar as:
A
B
C
D
E
F
G
H
I
J
âŚ
K
L
M
N
O
P
Q
R
S
T
0
NaN
2.0
1
2.0
3
4
3
5
5
3
âŚ
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
1
NaN
1.0
2
5.0
1
5
2
5
4
5
âŚ
2.0
3.0
2.0
1.0
1.0
5.0
4.0
3.0
5.0
4.0
2
NaN
3.0
2
3.0
1
4
1
2
3
5
âŚ
3.0
3.0
1.0
0.0
1.0
4.0
4.0
3.0
5.0
2.0
I was using px.box(df) and returns âValueError: Plotly Express cannot process wide-form data with columns of different type.â, but I think my data is in a tidy form, so I suppose is an error? I suppose I could fallback to traces but I wanted to know if it was an error that I was doing since it needs a transformation or it was a bug of plotly.
This is intentional: as the error message says, wide-form data is only accepted as is so long as all the columns are the same type, so you must select, say, only the numerical columns, as would make sense for a box plot.
Follow-up: I do see that Pandasâ built-in backend does this filtering automatically for kind="box" and this is something we can consider baking in to our backend, but at the moment, youâll have to explicitly provide either a dataframe with only numerical columns, i.e. df[["A","B","C"]].plot(kind="box") or a mixed dataframe with a specified list of columns to plot i.e. df.plot(kind="box", y=["A","B","C"])
But like they say, You learn something new every day . So thanks for taking your time to answer and investigate, and all the hard work you do to answer all of us, that most of the time are basic questions.
I think this is a bug. In pandas I saw what you said, it takes out categorical columns, but it used the rest of columns, mainly float64 and int64.
Instead with plotly if I try to plot the boxplot of A, B it works, if I try to px.box(aux.iloc[:,:3]) it returns again âValueError: Plotly Express cannot process wide-form data with columns of different type.â. And okay, is true that there are different types, but in pandas works, and here I suppose it should work too since both are numeric types.
I used the data from the first post, being A&B float64, and C int64.