Ff.create_distplot keeps giving non-normalized histograms/kde's

This is a pretty simple bug, or some weird functionality that I do not understand, and would like help with.

I am using ff.create_distplot() in my Dash app, when I noticed sometimes I don’t get all the y-values within the range [0,1].

The following MWE demonstrates this. Run it once, then simply un-comment the second x1 and x2, to see it happen on the second run. I believe this happens when the numbers are too bunched together, but that should not cause such a behavior either way.

import plotly.plotly as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.figure_factory as ff

import numpy as np

x1 = np.random.randn(10) - 2 
x2 = np.random.randn(10)

# x1 = [0.0, 0.0, 0.0, 0.0, 0.5, -0.12500000000000003, 0.0, -0.4, 0.0, 0.25]
# x2 = [-0.5, 0.0, 0.8375, 0.75, -0.8, 0.6, -1.0, -0.5999999999999999, 0.7, 0.85]

hist_data = [x1, x2]

group_labels = ['Group 1', 'Group 2']
colors = ['#A56CC1', '#A6ACEC']

# Create distplot with curve_type set to 'normal'
fig = ff.create_distplot(hist_data, group_labels, colors=colors,
                         bin_size=.2, show_rug=False)

# Add title
fig['layout'].update(title='Hist and Curve Plot')

# Plot!
iplot(fig, filename='Hist and Curve')

Best for #api:python forums

Just updated tag now.

Hi @Mike3,

The histnorm parameter to create_distplot() controls how the histograms are normalized. The default is 'probability density', which normalizes the bars so the the total area of all of the bars is 1. If you switch this to 'probability' then then sum of the heights of all of the bars will be 1.

Hope that helps,

1 Like

jmmease, I did the following, editing the hisrnorm parameter to probability for all the traces create_distplot() returns, and the bars are indeed fixed now, but what about the kde lines? :frowning:

This is what I get:

Is there any way to also “normalize” the kde lines, so they lay below 1, matching the bars?

Hi @Mike3,

Is there any way to also “normalize” the kde lines, so they lay below 1, matching the bars?

No, the KDE is an estimate of the probability density function of the distribution. By definition, the integral (area under the curve) should be 1. Unlike a histogram, since a pdf is continuous it’s not really meaningful to talk about the sum of the heights of the distribution.

Hope that makes sense,