Boxplot quartile seem's wrong

Steve · September 28, 2016, 10:14am

I realised that quartile calculated by plotly boxplot were not correct. Here my python code:
array = [0,2,3,5,8,9,10]
a = np.array(array)
print("Q1: " + str(np.percentile(a,25)))
print("median: " + str(np.percentile(a,50)))
print("Q3: " + str(np.percentile(a,75)))

trace = Box (
            y= array,
            boxpoints='all',
            jitter=0.3,
            pointpos=-1.8,
            name = "test"
            )
layout = Layout(
    title="title",
    yaxis=dict(
        title='Error in %',
        nticks=20
    )
)
data = [trace]
fig = Figure(data=data, layout=layout)

iplot(fig)

I declared a simple array of data and use numpy to calculate Q1 and Q3, then i use plotly box plot to see the results. Here the output:

My question now is why plotly values doesn’t correspond with the results calculated with numpy ?

etienne · September 28, 2016, 3:52pm

This is a rather contentious topic, as it turns out… see eg: http://www.amstat.org/publications/jse/v14n3/langford.html - there are at least 15 different ways you might choose to calculate quartiles!

They all generally give quartiles that are close to each other, and I don’t think it’s possible to say that one is right and the others are wrong. The one we chose is based on the idea that often statistics are based on a sample from a much larger distribution. In this case the larger distribution would be a uniform spread from 0.5 to 5.5. The quartiles for that would then be at 0.5+5.0/4 = 1.75, and 0.5+5.0*3/4 = 4.25.

citing Alex J.

preritg · October 18, 2016, 3:02pm

Is it possible to provide a parameter to decide on quartile calculation formula?

OliverBrace · July 13, 2018, 8:06am

This is an old thread, however I am having the exact same issue. Were you able to find either a solution or workaround. I have to recreate someone else’s work in plotly and they have used a different quarterly calculation.

Steve · July 16, 2018, 6:29am

Hi, I didn’t really look further, sorry. I just saw this other post that might help you Box Plots - manually supply median and quartiles (performance for alrge sample sizes).

OliverBrace · July 18, 2018, 1:30pm

I have been able to recreate the shapes themselves through lines, rectangles and points, so it basically looks and works like I want it to. Though it is missing the ability for everything to display when the mouse is hovered over like in the Boxplot (Which I would really like to find How to have hover text show all like with Boxplot)

ahmedhosny · July 2, 2020, 6:42pm

I understand the various methods to calculate the quartiles - but how can one extract the quartiles computed by Plotly? Perhaps another way around the same question: what existing python functions will calculate quartiles in the exact same way as Plotly does?

I tried all possible interpolation methods in np.percentile(), scipy.stats.iqr(), and pandas.quantile(). None seemed to match what Plotly plots.

Thanks!

nicolaskruchten · July 3, 2020, 1:22am

To give an updated answer to @OliverBrace, the box trace now supports manually providing parameters: https://plotly.com/python/box-plots/#box-plot-with-precomputed-quartiles

To answer @ahmedhosny, check out this part of the docs for a description of the default box-plot quartile calculation method, as well as the two built-in variants: https://plotly.com/python/box-plots/#choosing-the-algorithm-for-computing-quartiles

ahmedhosny · July 6, 2020, 5:45pm

Many thanks @nicolaskruchten - things are clear now. I guess my confusion came due to the naming. The linear in scipy/numpy is referring to the interpolation method, while the “linear” in Plotly seems to refer to the higher-level quantile calculation method (which in itself also uses linear interpolation despite producing a different result compared to scipy/numpy). Anyway, I have included a snippet below to highlight this for anyone in the future. Great work on this library it really makes data viz much more enjoyable!

import numpy as np
from scipy.stats import iqr

def plotly_linear_quantiles(y, quantile):
    """
        Based on #10 here: http://jse.amstat.org/v14n3/langford.html
        METHOD 10 (“H&L-2”): The Pth percentile value is found by taking that 
        value with #(np + 0.5). If this is not an integer, take the interpolated
        value between 'the floor' and 'the ceiling of that value'. As an example, 
        if S5 = (1, 2, 3, 4, 5) and p = 0.25, then #(np + 0.5) = #(1.75) and so Q1 = 1.75.
        
        args: 
        y: list to calculate quantile for
        quantile: requested quantile value between 0 and 1
    """
    # -1 because becuase we count starting at 0
    interp_val_x = len(y)*quantile + 0.5 - 1 
    if interp_val_x.is_integer():
        # int() to remove decimal
        return sorted(y)[int(interp_val_x)]
    else:
        return np.interp(interp_val_x, [x for x in range(len(y))], sorted(y))

    
def plotly_linear_IQR(y):
    return plotly_linear_quantiles(y, 0.75) - plotly_linear_quantiles(y, 0.25)


# linear by default in numpy and scipy, but included for clarity
l='linear' 

y = [1,2,3,4]
plotly_linear_IQR(y) # 2.0
iqr(y, interpolation=l) # 1.5
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 1.5

y = [1,2,3,4,5]
plotly_linear_IQR(y) # 2.5
iqr(y, interpolation=l) # 2.0
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 2.0

y = [1,2,3,4,5,6]
plotly_linear_IQR(y) # 3.0
iqr(y, interpolation=l) # 2.5
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 2.5

y = [1,2,3,4,5,6,7]
plotly_linear_IQR(y) # 3.5
iqr(y, interpolation=l) # 3.0
np.percentile(y, 75, interpolation=l) - np.percentile(y, 25, interpolation=l) # 3.0

Topic		Replies	Views
Box Plot - Q1 and Q3 Calculated Wrong plotly.js	1	1362	September 20, 2019
Finding code that calculates quartiles for box plots 📊 Plotly Python	2	613	June 28, 2023
Box Plots - manually supply median and quartiles (performance for alrge sample sizes) plotly.js	6	14029	May 12, 2022
Box plots, custom quantiles not working 📊 Plotly Python question	0	587	November 2, 2022
Change colors of box plot with custom percentile values 📊 Plotly Python question	2	151	May 13, 2024

Boxplot quartile seem's wrong

Related topics