I am having this dataframe
import random
import pandas as pd
random.seed(42)
so = pd.DataFrame({'x': random.sample(range(1, 100), 40),
'group': ['a']*10+['b']*10+['c']*10+['d']*10})
And I am calculating the percentiles:
so_percentiles = so.groupby('group')['x'].describe().reset_index()
which gives me:
group count mean std min 25% 50% 75% max
0 a 10.0 41.2 33.773099 4.0 15.75 30.5 70.50 95.0
1 b 10.0 52.6 32.083918 5.0 28.50 60.0 74.50 97.0
2 c 10.0 65.1 31.067847 1.0 55.00 75.0 87.75 96.0
3 d 10.0 48.1 31.014154 7.0 21.25 46.5 68.75 99.0
I want to plot a histogram of x
facetted by group
and for each group
plot a vertical line for the 25%
and 75%
percentile.
So I am using the following code:
import plotly.express as px
fig_so = px.histogram(so, x='x', facet_row='group', cumulative=False, histnorm='percent',
nbins=100,
category_orders={
'cluster': ['a','b','c','d']
})
for r, g in enumerate(['a','b','c','d']):
val_25 = so_percentiles.query('group == @g')['25%'].values[0]
val_75 = so_percentiles.query('group == @g')['75%'].values[0]
fig_so.add_vline(x=val_25, line_dash="dot", row=r, col="all",
annotation_text="25th percentile",
annotation_position="bottom left"
)
fig_so.add_vline(x=val_75, line_dash="dot", row=r, col="all",
annotation_text="75th percentile",
annotation_position="bottom right"
)
fig_so.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig_so.update_yaxes(matches=None)
fig_so.show()
Though this outputs:
It seems that the vertical lines are not correctly matched to each group
, since at the plot the 25th percentile for group d
is around 28 whereas it should be 21.25.
I am using plotly version 5.5.0
Any idea why this is happening ?