# Boxplot quartile seem's wrong

I realised that quartile calculated by plotly boxplot were not correct. Here my python code:
array = [0,2,3,5,8,9,10]
a = np.array(array)
print("Q1: " + str(np.percentile(a,25)))
print("median: " + str(np.percentile(a,50)))
print("Q3: " + str(np.percentile(a,75)))

``````trace = Box (
y= array,
boxpoints='all',
jitter=0.3,
pointpos=-1.8,
name = "test"
)
layout = Layout(
title="title",
yaxis=dict(
title='Error in %',
nticks=20
)
)
data = [trace]
fig = Figure(data=data, layout=layout)

iplot(fig)
``````

I declared a simple array of data and use numpy to calculate Q1 and Q3, then i use plotly box plot to see the results. Here the output: My question now is why plotly values doesn’t correspond with the results calculated with numpy ?

This is a rather contentious topic, as it turns out… see eg: http://www.amstat.org/publications/jse/v14n3/langford.html - there are at least 15 different ways you might choose to calculate quartiles!

They all generally give quartiles that are close to each other, and I don’t think it’s possible to say that one is right and the others are wrong. The one we chose is based on the idea that often statistics are based on a sample from a much larger distribution. In this case the larger distribution would be a uniform spread from 0.5 to 5.5. The quartiles for that would then be at 0.5+5.0/4 = 1.75, and 0.5+5.0*3/4 = 4.25.

• citing Alex J.

Is it possible to provide a parameter to decide on quartile calculation formula?

This is an old thread, however I am having the exact same issue. Were you able to find either a solution or workaround. I have to recreate someone else’s work in plotly and they have used a different quarterly calculation.

Hi, I didn’t really look further, sorry. I just saw this other post that might help you Box Plots - manually supply median and quartiles (performance for alrge sample sizes).

I have been able to recreate the shapes themselves through lines, rectangles and points, so it basically looks and works like I want it to. Though it is missing the ability for everything to display when the mouse is hovered over like in the Boxplot (Which I would really like to find How to have hover text show all like with Boxplot)

I understand the various methods to calculate the quartiles - but how can one extract the quartiles computed by Plotly? Perhaps another way around the same question: what existing python functions will calculate quartiles in the exact same way as Plotly does?

I tried all possible interpolation methods in np.percentile(), scipy.stats.iqr(), and pandas.quantile(). None seemed to match what Plotly plots.

Thanks!

To give an updated answer to @OliverBrace, the `box` trace now supports manually providing parameters: https://plotly.com/python/box-plots/#box-plot-with-precomputed-quartiles

To answer @ahmedhosny, check out this part of the docs for a description of the default box-plot quartile calculation method, as well as the two built-in variants: https://plotly.com/python/box-plots/#choosing-the-algorithm-for-computing-quartiles

1 Like

Many thanks @nicolaskruchten - things are clear now. I guess my confusion came due to the naming. The linear in scipy/numpy is referring to the interpolation method, while the “linear” in Plotly seems to refer to the higher-level quantile calculation method (which in itself also uses linear interpolation despite producing a different result compared to scipy/numpy). Anyway, I have included a snippet below to highlight this for anyone in the future. Great work on this library it really makes data viz much more enjoyable!

``````import numpy as np
from scipy.stats import iqr

def plotly_linear_quantiles(y, quantile):
"""
Based on #10 here: http://jse.amstat.org/v14n3/langford.html
METHOD 10 (“H&L-2”): The Pth percentile value is found by taking that
value with #(np + 0.5). If this is not an integer, take the interpolated
value between 'the floor' and 'the ceiling of that value'. As an example,
if S5 = (1, 2, 3, 4, 5) and p = 0.25, then #(np + 0.5) = #(1.75) and so Q1 = 1.75.

args:
y: list to calculate quantile for
quantile: requested quantile value between 0 and 1
"""
# -1 because becuase we count starting at 0
interp_val_x = len(y)*quantile + 0.5 - 1
if interp_val_x.is_integer():
# int() to remove decimal
return sorted(y)[int(interp_val_x)]
else:
return np.interp(interp_val_x, [x for x in range(len(y))], sorted(y))

def plotly_linear_IQR(y):
return plotly_linear_quantiles(y, 0.75) - plotly_linear_quantiles(y, 0.25)

# linear by default in numpy and scipy, but included for clarity
l='linear'

y = [1,2,3,4]
plotly_linear_IQR(y) # 2.0
iqr(y, interpolation=l) # 1.5
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 1.5

y = [1,2,3,4,5]
plotly_linear_IQR(y) # 2.5
iqr(y, interpolation=l) # 2.0
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 2.0

y = [1,2,3,4,5,6]
plotly_linear_IQR(y) # 3.0
iqr(y, interpolation=l) # 2.5
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 2.5

y = [1,2,3,4,5,6,7]
plotly_linear_IQR(y) # 3.5
iqr(y, interpolation=l) # 3.0
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 3.0
``````