Distplot histogram count

Hi all,

I’m just discovering distplot and love it, it’s making awesome plots for my kind of data ! However, there is still something missing for me but I can’t really figure out how to do it…

My data looks like this with different neuron types as column names and activity times in each columns. This makes it that all columns don’t have the same size.

Displot is doing an awesome job making bins for each neurons but all bins have different start and ends. What I was interesting is though is the number of activity timepoints (lower plot) for each bins (upper plot). I guess having it as Y values would work for the upper plot but since the bins are so different I’m getting stucK…

Would you have any ideas on how to get that ?

Thanks a lot ! :slight_smile:

@lguerard I didn’t understand how the plot in the posted image is created, because the bars in the associated histogram have a unique color, whereas the legend shows many colors. Is it the histogram and probability density function (curve) associated to just a column, and you defined a subplot with many rows, such that on each row you plotted the distplot of a column?
If this is not the case, please write down here your call for ff.create_distplot.

If you want to illustrate in distplot the count for each bin, then follow this example:

import plotly.figure_factory as ff
import numpy as np
np.random.seed(2020)


group_labels = ['distplot'] # name of the dataset

fig = ff.create_distplot([np.random.randn(1000)],  group_labels, histnorm= '', bin_size=0.5)
fig.update_layout(width=700, bargap=0.01)

By default the histnorm in ff.create_distplot is set as 'probability density', i.e. each histogram bar has the height equal to the probability of data falling in the corresponding bin. Setting histnorm = '' you’ll get a histogram with each bar of height equal to the number of data points that bin.

Hi,

Indeed, I just selected a few populations to display on the graph since you can just click on the pop name to show/hide it. I thought this would make visibility easier !

Super cool ! This worked flawlessly, thank you very much ! :smiley:. However, as you can see below the curve is not displayed anymore is there a way to still show it ?

Thanks again ! :slight_smile:

@lguerard When histnorm='', the graphs of the estimated pdfs cannot be seen because their max values are very small compared to the max number of counts in a bin, i.e. their graphs is very close to x-axis.

For example if the estimated pdf is almost equal to the normal pdf:

f(x) =(1/(sigma *sqrt(2 pi))) e^{-(x-mean)^2/2sigma^2)}

then the max value of this pdf is f(mean) = 1/(sigma *sqrt(2 pi)). For sigma >=1 this max value is less than 1. If the max count is 200, it is obvious that the graph of the pdf has its y-coordinates <1, and are very, very small compared to 200.

The graph of the estimated pdf is included in the plot generated by ff.create_distplot to compare it with the histogram coresponding to histnorm='probability density'.

1 Like

It makes sense. Thanks for clarifying this :slight_smile:.

I have another question not linked directly to distplot but plotly in general. Is it possible to interactively extract the displayed values when zoomed in.

For example, if I zoom in a specific time zone, can I then use all the values of that time zone to plot a different plot ? I know I can just filter the dataframe using the values, but we’re having users not super familiar with python and plotly…

Thank you very much again !

@lguerard Where is your Figure plotted? Is it in a jupyter notebook or inserted in a web page to be accessed by users?

@empet Jupyter Notebook !

If you define your figure as a go.FigureWidget then you can perform some interaction with your plot, but not generating a new plot from data displayed in the window after a zoom-in:

Some examples:

1 Like

Aweome ! Thanks a lot, I’ll look into that but it could already be interesting to have a slider for bin size !

Coming back to the curve, I should be able to plot an extrapolated version of the curve with number of bins on top of the figure, is that correct ?

@lguerard Here is the visual argument for the pdf graphs:

import plotly.figure_factory as ff
import numpy as np
np.random.seed(2020)


group_labels = ['data1', 'data2'] # name of the datasets
data1 = np.random.randn(1000)
data2 = 2+1.3*np.random.randn(500)

fig1 = ff.create_distplot([data1, data2],  
                          group_labels, histnorm= '', bin_size=0.5, show_curve=True )
fig1.update_layout(width=700, bargap=0.01)
fig1.show()

fig2 = ff.create_distplot([data1, data2],  
                          group_labels, histnorm= '', bin_size=0.5, show_hist=False, show_curve=True, show_rug=False )
fig2.update_layout(fig1.layout)

Now I change the yaxis_range in the second plot to see where is its position in the first one, and why it cannot be seen:

fig2.update_layout(yaxis_range= [-1,205])

1 Like

Yes, of course I understand that the axis is causing issue !

Sorry, I should have been clearer: I want to create a new plot extrapolating on the histogram values and then overlaying it on top of the distplot.

But for that, is there any way to extract the histogram values ?

Also, I tried using FigureWidget to have some interaction but this code gives me weird result…

bin_size_var = 5


fig = ff.create_distplot([neuron_data[c][neuron_data[c].notnull()] for c in neuron_data.columns], 
                         neuron_data.columns, histnorm='', bin_size=bin_size_var)

# fig.show()
# find the range of the slider.
xmin = neuron_data.min().min()
xmax = neuron_data.max().max()

# create FigureWidget from fig
f = go.FigureWidget(fig)


# our function that will modify the xaxis range
def update_range(start, end):
    f.layout.xaxis.range = [start, end]

# display the FigureWidget and slider with center justification
vb = VBox((f, interactive(update_range,
                          start=(xmin, xmax, (xmax - xmin) / 1000.0),
                          end=(xmin, xmax, (xmax - xmin) / 1000.0))))
vb.layout.align_items = 'center'
vb

So fig works fine and prints the plots that I showed before but vb just prints VkJveChjaGlsZHJlbj0oRmlndXJlV2lkZ2V0KHsKICAgICdkYXRhJzogW3snYXV0b2JpbngnOiBGYWxzZSwKICAgICAgICAgICAgICAnaGlzdG5vcm0nOiAnJywKICAgICAgICAgICAgICAnbGXigKY= somehow…

Any ideas ? Are figure_factory not possible to display in FigureWidget ?