I am attempting to have box chart, with categories ordered by median. My expectation is that the median line drawn inside the boxes should appear in the sorted manner, but this does not happen:
df_apps_ok_paid = df_apps_ok[df_apps_ok.Type=='Paid']
fig=px.box(df_apps_ok_paid, y='Rev_Estimate', x='Category', title="How much can paid apps earn?")
fig.update_yaxes(type='log', title="Paid app ballpark value")
fig.update_xaxes(categoryorder='median descending')
fig.update_layout(height=700)
As you can see the median whiskers are all over the place, up and down, while I expected them to be in order top-left to bottom-right.
I guess it is because Box chart excludes the outliers and the code behind the categoryorder property does not? Is this by design and will remain this way or are there workarounds? Canβt seem to find anything in the dox on this: Layout.xaxis in Python .
Here is an example of ordering the box chart based on the medians with a default plotly dataset:
df = px.data.tips().drop(columns=['sex', 'smoker', 'day', 'time']) #Drop non-numerical columns
df = (df-df.mean())/(df.std()) #Scale them all to make them comparable
df = df.reindex(df.median().sort_values().index, axis=1) #Reorder the columns based ob their median
fig = px.box(df)
fig.show()
Thank you for the response.
In my case the dataset structure required a grouping approach β but I see the answer is really either
Supply predefined order of categories - or
specify it explicitly.
I am still not sure why exactly the category ordering by median does not work in my example - but for the sake of completeness here is the same code with explicit ordering:
#assumung df_apps_ok_paid is already selected
ordering = df_apps_ok_paid.groupby('Category')['Rev_Estimate'].median().sort_values(ascending=False).index
fig=px.box(df_apps_ok_paid, y='Rev_Estimate', x='Category', title="How much can paid apps earn?")
fig.update_yaxes(type='log', title="Paid app ballpark value")
fig.update_xaxes(categoryorder='array', categoryarray=ordering)
fig.update_layout(height=700)