Extracting box plot statistics

Hello, I am using go.Box to generate box plots for some data. In addition to the figures, I would to save the statistical information that shows up on the hover including quartile information, median, mean, number of outliers, etc. I’ve looked but been unable to locate this information. Any help in resolving this issue would be greatly appreciated.

Thanks!

1 Like

Welcome to the forum @arajan
What do you mean you would like to SAVE the statistical information that shows up on the hover? Where do you want to save it?

1 Like

Hi @arajan you cannot have Javascript communicate back to Python the values of the quantiles (this computation is made in Javascript), but you can precompute them using the various methods of scipy.stats (scoreatpercentile, iqr, etc) and then pass them to go.Box as in this example https://plotly.com/python/box-plots/#box-plot-with-precomputed-quartiles.

1 Like

Thanks for the replies :slight_smile:.

@adamschroeder the idea was to save and display the statistics in a table underneath the plots themselves. In addition to displaying the interactive plots, the work flow I have designed involves printing the plots to PDF.

@Emmanuelle thanks for this suggestion. I had seen this option, but wasn’t sure if there was an in-built solution. Thanks, I was able to resolve the issue using your solution.

@arajan
Hi, I am also trying to display the statistics in a table below the boxplots.
Can i know how you manage to do it?

@dentaeny we ended up not using the statistics from plotly. I calculated the statistics via matplotlib, using the logic given here - https://github.com/matplotlib/matplotlib/blob/abe0e3957596c1f455a2f84b97731651f1e5b9cf/lib/matplotlib/cbook/init.py#L1142 - and then displayed it using Javascript in the front-end. Hope this helps :slight_smile:.

If you want to get the same statistics as Plotly;

## Calculate quartiles as outlined in the plotly documentation
## (method #10 in paper https://jse.amstat.org/v14n3/langford.html)
def get_percentile(data, p):
    data.sort()
    n = len(data)
    x = n*p + 0.5

    #  If integer, return
    if x.is_integer():
        return round(data[int(x-1)], 2) # account for zero-indexing

    #  If not an integer, get the interpolated value of the values of floor and ceiling indices
    x1, x2 = math.floor(x), math.ceil(x)
    y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
    return round(np.interp(x=x, xp=[x1, x2], fp=[y1, y2]), 2)

## calculate all boxplot statistics
q1, median, q3 = get_percentile(data, 0.25), get_percentile(data, 0.50), get_percentile(data, 0.75)
iqr = q3 - q1
# Lower fence value is the minimum of y values that is more than the calculated lower limit
lower_limit = q1 - 1.5 * iqr
lower_fence = round(min([i for i in data.tolist() if i >= lower_limit]), 2)
# Upper fence value is the maximum of y values that is less than the calculated upper limit
upper_limit = q3 + 1.5 * iqr
upper_fence = round(max([i for i in data.tolist() if i <= upper_limit]), 2)