Black Lives Matter. Please consider donating to Black Girls Code today.
Dash HoloViews is now available! Check out the docs.

Automatic generation of multifactorial boxplots with plotly in Python

Hi all,

I am trying to use plotly for creating inline boxplots in jupyter notebooks running on a python kernel. My problem is that all example code I was able to find requires to define a trace for each box explicitly, e.g.:

import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np

y0 = np.random.randn(50)-1
y1 = np.random.randn(50)+1

trace0 = go.Box(
y=y0
)
trace1 = go.Box(
y=y1
)
data = [trace0, trace1]
py.iplot(data)

This is fine if I only have a few boxplots, but I generally work with multifactorial data. I am looking for a way to use grouping variables to generate boxplots. This is reasonably straightforward with plotly in R:

p <- plot_ly(diamonds, y = ~price, color = I(“black”),
alpha = 0.1, boxpoints = “suspectedoutliers”)
p1 <- p %>% add_boxplot(x = “Overall”)
p2 <- p %>% add_boxplot(x = ~cut)
subplot(
p1, p2, shareY = TRUE,
widths = c(0.2, 0.8), margin = 0
) %>% hide_legend()

However, I am still not sure this has the same flexibility as R base plotting would allow via the formula syntax, where I can easily plot multiple boxplots arranged by factorial levels (y~xuv etc). Is there any way to achieve this with plotly in python without handily defining a trace for each individual box (I tried using the R-code in my python kernel with rpy2 but the plot doesn’t show - it works on an IRkernel, though)?

Thanks
D

@Thriceguy In order to plot many boxplots you should process your data, and define an adequate function
that returns a boxplot trace. More precisely your function should have arguments and kwargs that
cover all needed features for your boxplots. Here is an example.

Thanks a lot! I somehow was unable to find this example…