Plotly Histogram nbinx does not provide the right number of bins in Python

With the following code I get 15 bins instead of 20 as expected.

import pandas as pd
from plotly.offline import plot
from plotly.graph_objs import Histogram

df = pd.read_csv('https://raw.githubusercontent.com/AntoineGautier/Data/master/tmp.txt')

plot(dict(data=[Histogram(x=df.ws, nbinsx=20)],
          layout=dict(xaxis=dict(dtick=1), bargap=0.25)
         ))
1 Like

I also noticed this. Did you manage to find a solution?

Setting nbinx specifies the maximum number of desired bins not the exact number of bins to show. WIth nbinx set, plotly still attempts to round the number of bins to nice numbers. See the full attribute reference here for more info.

To set the exact number of bins, youโ€™ll need to use a combination of xbins.start, xbins.end and xbins.size.

I tried to set the xbins.start, end, size, but no working. Do you have an example?

This finally worked for me:

fig = go.Figure()
fig.add_trace(go.Histogram(x=penetration.p_member, name=โ€˜populationโ€™, histnorm=โ€˜percentโ€™,
xbins=dict(start=โ€˜0โ€™,end=โ€˜0.4โ€™, size=0.05)
),

         )

fig.add_trace(go.Histogram(x=df_test88.toPandas().p_member, name=โ€˜sample_88โ€™, histnorm=โ€˜percentโ€™,
xbins=dict(start=โ€˜0โ€™,end=โ€˜0.4โ€™, size=0.05)
))
fig.update_layout(barmode=โ€˜overlayโ€™)

Reduce opacity to see both histograms

fig.update_traces(opacity=0.75)
fig.show()

Does anybody know more about how the algorithm finds the optimal bin sizes? I have a multi plot which I want to standardize the y-axis ranges to, which means that I need to be able to count the number of data within a bin.

The automatic behavior that ignores nbins is extremely misleading for overlapping distributions.

I feel like I ran into this previously and had to create a custom dataframe that binned things manually.

The best workaround that I have found for this is to specify a higher number of bins than you actually want (e.g. if you want 7, then use a value of 9).