Plotly Upset Plot

Hi everyone,
I am new to plotly/dash, but I noticed there is no built in for an “upset plot” for showing set intersections (see here: https://upset.app/). I just made a function for making these plots from the base graph objects. It is still rough (could use axis ticks and background for the bar portion of the chart). It takes in a dataframe of boolean columns where 1 means a row belongs to a group, 0 otherwise. Would greatly appreciate it if someone could assist in adding the ticks and backgrounds to the top part of the chart. Thank you!

Maybe this could be added to the plotly express or figure factory? Here is the code:

import plotly.graph_objects as go
import pandas as pd
from itertools import compress
import itertools

def plotly_upset_plot(df):
    # an array of dimensions d x d*2^d possible subsets where d is the number of columns
    subsets = []
    # the sizes of each subset (2^d array)
    subset_sizes = [ ]
    d = len(df.columns)
    for i in range(1, d + 1):
        subsets = subsets + [list(x) for x in list(itertools.combinations(df.columns, i))]
        
    for s in subsets:
        curr_bool = [1]*len(df)
        for col in df.columns:
            if col in s: curr_bool = [x and y for x, y in zip(curr_bool, list(df.loc[:, col].copy()))]
            else: curr_bool = [x and not y for x, y in zip(curr_bool, list(df.loc[:, col].copy()))]
        subset_sizes.append(sum(curr_bool))
    
    
    plot_df = pd.DataFrame({'Intersection': subsets, 'Size':subset_sizes})
    plot_df = plot_df.sort_values(by = 'Size', ascending = False)
    max_y = max(plot_df['Size'])+0.1*max(plot_df['Size'])
    
    subsets = list(plot_df['Intersection'])
    scatter_x = []
    scatter_y = []
    for i, s in enumerate(subsets):
        for j in range(d):
            scatter_x.append(i)
            scatter_y.append(-j*max_y/d-0.1*max_y)
            
    fig = go.Figure()
#     fig.add_trace(go.Scatter(x=[-1.2,len(subsets)],y= [max_y+0.1*max_y,max_y+0.1*max_y],fill='tozeroy'))
    template =  ['' for x in scatter_x]
    fig.add_trace(go.Scatter(x = scatter_x, y = scatter_y, mode = 'markers', showlegend=False, marker=dict(size=16,color='#C9C9C9'), hovertemplate = template))
    fig.update_layout(xaxis=dict(showgrid=False, zeroline=False),
                  yaxis=dict(showgrid=True, zeroline=False),
                   plot_bgcolor = "#FFFFFF", margin=dict(t=40, l=150)) 
    
    for i, s in enumerate(subsets):
        scatter_x_has = []
        scatter_y_has = []
        for j in range(d):
            if df.columns[j] in s:
                scatter_x_has.append(i)
                scatter_y_has.append(-j*max_y/d-0.1*max_y)
                fig.add_trace(go.Scatter(x = scatter_x_has, y = scatter_y_has, mode = 'markers+lines', showlegend=False, marker=dict(size=16,color='#000000',showscale=False), hovertemplate = template))
    fig.update_xaxes(showticklabels=False) # Hide x axis ticks 
    fig.update_yaxes(showticklabels=False) # Hide y axis ticks
    fig.update_traces(hoverinfo=None)
    
    plot_df['Intersection'] = ['+'.join(x) for x in plot_df['Intersection']]
    template =  [f'<extra><br><b>{lab}</b><br><b>N-Count</b>: {n}</extra>' for  lab, n in zip(plot_df['Intersection'], plot_df['Size'])]
    bar = go.Bar(x = list(range(len(subsets))), y = plot_df['Size'], marker = dict(color='#000000'),  text = plot_df['Size'], hovertemplate = template, textposition='outside', hoverinfo='none')
    fig.add_trace(bar)
    
    template =  ['' for x in range(d)]
    max_string_len = max([len(x) for x in df.columns])
    fig_lab = go.Scatter(x = [-0.01*max_string_len]*d, y = scatter_y, text = df.columns, mode = 'text', textposition='middle left',showlegend=False, hovertemplate = template)
    fig_lab = go.Scatter(x = [-0.01*max_string_len]*d, y = scatter_y, text = df.columns, mode = 'text', textposition='middle left',showlegend=False, hovertemplate = template)
    fig.add_trace(fig_lab)
    fig.update_layout(title = '<b>Intersections<b>', yaxis_range=[-max_y-0.1*max_y-1,max_y+0.1*max_y], xaxis_range = [-0.13*max_string_len, len(subsets)], showlegend = False, title_x=0.5)
    
    return fig
2 Likes

Hey there.
I’m interested in this plot but could use some more documentation.
Could you provide an example of the data structure that’s meant to go in the function to get the UpSetPlot?

Thanks

A pandas dataframe like:
df = pd.DataFrame({‘Group 1’ : [1, 0, 1, 0, 1], ‘Group 2’: [1, 0, 0, 0, 1], ‘Group 3’: [1, 1, 1, 1, 1]})

Each row an observation, each column a group it may belong to. Revisiting this I think it should be possible to show the count of observations in none of the groups.

@dashinglat
A good reference is this original article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4720993/ from IEEE Transactions on Visualisation and Computer Graphics.
Free download is available.

Hi @venturellac,
Could you post a Plotly plot for the Upset plot generated with your code? An image is worth a thousand words. :slight_smile:

Yes of course :slight_smile: The output should look like this:

In this one, I added one line to show the individuals belonging to no group (the fourth largest group here of N = 76):

subsets = subsets + [[ ]]

Maybe showing these individuals can be an optional argument? Also it is worth noting that in my own usage, I cap the width of the dataframe to 5 columns because the size of the figure grows exponentially in the number of groups.

3 Likes

FYI: a pull request has been made to add this to figure_factory: Venturellac dev ff upset plot by venturellac ¡ Pull Request #3833 ¡ plotly/plotly.py ¡ GitHub

hi @venturellac
Thank you for opening a PR for this feature request. I’ll make sure our engineers see this.

Kind regards,

2 Likes

Did you ever extend/modify this code for the Upset style of plots? I am considering using the Upset.js module and port it into Dash as a custom component, but your code is a good generic implementation alhtough missing the horizontal bar plots and interactivity I assume, which Upset.js could provide (although license is prohibitively for commercial operation which I am part of)…

Thank you for your comment - I did not add much by way of styling and I do not plan to as my current work does not require these kinds of analytics. If you are to use my implementation, I’ll warn you that I never devised a good way to add labels to the left margin (or similarly horizontal bars as you suggest) due to the quirky way of aligning the dots and the vertical bars with x coordinates. For example, if you have very wide left margins and/or a small overall chart width, you may have to adjust this decimal so labels do not get cut off. I tooled around with making the sizing automatic, but never got it working in general:

fig_lab = go.Scatter(x = [-0.01*max_string_len]d, y = scatter_y, text = df.columns, mode = ‘text’, textposition=‘middle left’,showlegend=False, hovertemplate = template)
fig_lab = go.Scatter(x = [-0.01
max_string_len]*d, y = scatter_y, text = df.columns, mode = ‘text’, textposition=‘middle left’,showlegend=False, hovertemplate = template)
fig.add_trace(fig_lab)

Hope this helps!

1 Like

really amazing work. right now i am making an upset in matplotlib and putting the image into a page.

Hey @rictuar @adamschroeder @venturellac @thondeboer @empet @dashinglat @alexcjohnson I’m no coder, but understand an upset plot might be useful when you want to examine a number of intersecting sets (versus a venn diagram which doesn’t really work after 3-4 sets). Have any of you tried using latest ChatGPT to get any input on creating the code using the Plotly charting library for such?

hi @DaveG
Yes, I’ve tried that with ChatGPT 3.5 but didn’t get desirable results. It kept creating code for different type of simple bar charts.

Perhaps with ChatGPT 4 it would work better.