How does create_dendrogram defined similarity?

kafkaskid415 · October 20, 2016, 6:24pm

I found a really great tutorial on the website about making dendrogram plots with heatmaps:

However, the create_dendrogram function is very much a black box, and the documentation doesn’t describe how distances between samples are actually computed (e.g., Euclidean distance?). It works both when I plug in a correlation matrix or the data or the data itself (see below, where df is a pandas datframe of my data). Is it computing the pairwise Euclidean distance for all rows, and using that as the data from the distances in the dendrogram?

Correlation matrix:
FF.create_dendrogram(df.corr(), orientation=‘bottom’, labels=labels)

Just the data itself:
FF.create_dendrogram(df.T, orientation=‘bottom’, labels=labels)

Thanks!
chris

gblackshields · January 10, 2017, 10:07am

Hi Chris,

I’m aware this is a little late. I wanted to know the same things (actually I wanted to know how to change these things), so I jumped through the plotly code to find out. For reference, the create_dendrogram function is in tools.py. As you might expect, the code uses scipy to calculate the underlying clustering used in the dendrogram, with the relevant bits being :

import scipy.cluster.hierarchy as sch
import scipy.spatial as scs

and

d = scs.distance.pdist(X)
Z = sch.linkage(d, method='complete')
P = sch.dendrogram(Z, orientation=self.orientation, labels=self.labels, no_plot=True)

What’s clear from this is that when creating d (the distance matrix), the distance metric used is not specified, and so reverts to scipy’s default method (Euclidean), and the linkage method used to calculate Z (the clustering) is hard-coded to ‘complete’. So far as I can tell, there is currently no way to change these parameters without actually changing the plotly base code, which I’ve admittedly done before, as I needed different metrics (Pearson correlation).

Topic		Replies	Views
Interactive dengoram with plotly and custom distance, linkage functions 📊 Plotly Python	1	2227	February 10, 2020
Dendrogram plot from pre-computed distance matrix (or even pre-computed dendrogram)? 📊 Plotly Python	0	674	January 24, 2022
Create Dendrogram with Distance Matrix without Sample Data 📊 Plotly Python	0	585	November 29, 2021
How to create custom dendrogram without clustering from phylogenetic file 📊 Plotly Python	0	438	November 2, 2023
Plot a Dendrogram with a Heatmap, using distance matrix, not raw data 📊 Plotly Python question	0	355	October 17, 2023

How does create_dendrogram defined similarity?

Related topics