Plotly Upset Plot

venturellac · May 8, 2022, 5:34pm

Hi everyone,
I am new to plotly/dash, but I noticed there is no built in for an “upset plot” for showing set intersections (see here: https://upset.app/). I just made a function for making these plots from the base graph objects. It is still rough (could use axis ticks and background for the bar portion of the chart). It takes in a dataframe of boolean columns where 1 means a row belongs to a group, 0 otherwise. Would greatly appreciate it if someone could assist in adding the ticks and backgrounds to the top part of the chart. Thank you!

Maybe this could be added to the plotly express or figure factory? Here is the code:

import plotly.graph_objects as go
import pandas as pd
from itertools import compress
import itertools

def plotly_upset_plot(df):
    # an array of dimensions d x d*2^d possible subsets where d is the number of columns
    subsets = []
    # the sizes of each subset (2^d array)
    subset_sizes = [ ]
    d = len(df.columns)
    for i in range(1, d + 1):
        subsets = subsets + [list(x) for x in list(itertools.combinations(df.columns, i))]
        
    for s in subsets:
        curr_bool = [1]*len(df)
        for col in df.columns:
            if col in s: curr_bool = [x and y for x, y in zip(curr_bool, list(df.loc[:, col].copy()))]
            else: curr_bool = [x and not y for x, y in zip(curr_bool, list(df.loc[:, col].copy()))]
        subset_sizes.append(sum(curr_bool))
    
    
    plot_df = pd.DataFrame({'Intersection': subsets, 'Size':subset_sizes})
    plot_df = plot_df.sort_values(by = 'Size', ascending = False)
    max_y = max(plot_df['Size'])+0.1*max(plot_df['Size'])
    
    subsets = list(plot_df['Intersection'])
    scatter_x = []
    scatter_y = []
    for i, s in enumerate(subsets):
        for j in range(d):
            scatter_x.append(i)
            scatter_y.append(-j*max_y/d-0.1*max_y)
            
    fig = go.Figure()
#     fig.add_trace(go.Scatter(x=[-1.2,len(subsets)],y= [max_y+0.1*max_y,max_y+0.1*max_y],fill='tozeroy'))
    template =  ['' for x in scatter_x]
    fig.add_trace(go.Scatter(x = scatter_x, y = scatter_y, mode = 'markers', showlegend=False, marker=dict(size=16,color='#C9C9C9'), hovertemplate = template))
    fig.update_layout(xaxis=dict(showgrid=False, zeroline=False),
                  yaxis=dict(showgrid=True, zeroline=False),
                   plot_bgcolor = "#FFFFFF", margin=dict(t=40, l=150)) 
    
    for i, s in enumerate(subsets):
        scatter_x_has = []
        scatter_y_has = []
        for j in range(d):
            if df.columns[j] in s:
                scatter_x_has.append(i)
                scatter_y_has.append(-j*max_y/d-0.1*max_y)
                fig.add_trace(go.Scatter(x = scatter_x_has, y = scatter_y_has, mode = 'markers+lines', showlegend=False, marker=dict(size=16,color='#000000',showscale=False), hovertemplate = template))
    fig.update_xaxes(showticklabels=False) # Hide x axis ticks 
    fig.update_yaxes(showticklabels=False) # Hide y axis ticks
    fig.update_traces(hoverinfo=None)
    
    plot_df['Intersection'] = ['+'.join(x) for x in plot_df['Intersection']]
    template =  [f'<extra><br><b>{lab}</b><br><b>N-Count</b>: {n}</extra>' for  lab, n in zip(plot_df['Intersection'], plot_df['Size'])]
    bar = go.Bar(x = list(range(len(subsets))), y = plot_df['Size'], marker = dict(color='#000000'),  text = plot_df['Size'], hovertemplate = template, textposition='outside', hoverinfo='none')
    fig.add_trace(bar)
    
    template =  ['' for x in range(d)]
    max_string_len = max([len(x) for x in df.columns])
    fig_lab = go.Scatter(x = [-0.01*max_string_len]*d, y = scatter_y, text = df.columns, mode = 'text', textposition='middle left',showlegend=False, hovertemplate = template)
    fig_lab = go.Scatter(x = [-0.01*max_string_len]*d, y = scatter_y, text = df.columns, mode = 'text', textposition='middle left',showlegend=False, hovertemplate = template)
    fig.add_trace(fig_lab)
    fig.update_layout(title = '<b>Intersections<b>', yaxis_range=[-max_y-0.1*max_y-1,max_y+0.1*max_y], xaxis_range = [-0.13*max_string_len, len(subsets)], showlegend = False, title_x=0.5)
    
    return fig

dashinglat · May 29, 2022, 1:57am

Hey there.
I’m interested in this plot but could use some more documentation.
Could you provide an example of the data structure that’s meant to go in the function to get the UpSetPlot?

Thanks

venturellac · May 29, 2022, 8:24pm

A pandas dataframe like:
df = pd.DataFrame({‘Group 1’ : [1, 0, 1, 0, 1], ‘Group 2’: [1, 0, 0, 0, 1], ‘Group 3’: [1, 1, 1, 1, 1]})

Each row an observation, each column a group it may belong to. Revisiting this I think it should be possible to show the count of observations in none of the groups.

empet · May 30, 2022, 9:43am

@dashinglat
A good reference is this original article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4720993/ from IEEE Transactions on Visualisation and Computer Graphics.
Free download is available.

empet · May 30, 2022, 9:47am

Hi @venturellac,
Could you post a Plotly plot for the Upset plot generated with your code? An image is worth a thousand words.

venturellac · May 30, 2022, 2:13pm

Yes of course The output should look like this:

In this one, I added one line to show the individuals belonging to no group (the fourth largest group here of N = 76):

subsets = subsets + [[ ]]

Maybe showing these individuals can be an optional argument? Also it is worth noting that in my own usage, I cap the width of the dataframe to 5 columns because the size of the figure grows exponentially in the number of groups.

venturellac · August 8, 2022, 2:41am

FYI: a pull request has been made to add this to figure_factory: Venturellac dev ff upset plot by venturellac · Pull Request #3833 · plotly/plotly.py · GitHub

adamschroeder · August 9, 2022, 7:40pm

hi @venturellac
Thank you for opening a PR for this feature request. I’ll make sure our engineers see this.

Kind regards,

thondeboer · December 16, 2022, 5:10pm

Did you ever extend/modify this code for the Upset style of plots? I am considering using the Upset.js module and port it into Dash as a custom component, but your code is a good generic implementation alhtough missing the horizontal bar plots and interactivity I assume, which Upset.js could provide (although license is prohibitively for commercial operation which I am part of)…

venturellac · December 16, 2022, 7:09pm

Thank you for your comment - I did not add much by way of styling and I do not plan to as my current work does not require these kinds of analytics. If you are to use my implementation, I’ll warn you that I never devised a good way to add labels to the left margin (or similarly horizontal bars as you suggest) due to the quirky way of aligning the dots and the vertical bars with x coordinates. For example, if you have very wide left margins and/or a small overall chart width, you may have to adjust this decimal so labels do not get cut off. I tooled around with making the sizing automatic, but never got it working in general:

fig_lab = go.Scatter(x = [-0.01*max_string_len]d, y = scatter_y, text = df.columns, mode = ‘text’, textposition=‘middle left’,showlegend=False, hovertemplate = template)
fig_lab = go.Scatter(x = [-0.01max_string_len]*d, y = scatter_y, text = df.columns, mode = ‘text’, textposition=‘middle left’,showlegend=False, hovertemplate = template)
fig.add_trace(fig_lab)

Hope this helps!

rictuar · January 14, 2023, 6:01pm

really amazing work. right now i am making an upset in matplotlib and putting the image into a page.

DaveG · November 22, 2023, 5:42am

Hey @rictuar @adamschroeder @venturellac @thondeboer @empet @dashinglat @alexcjohnson I’m no coder, but understand an upset plot might be useful when you want to examine a number of intersecting sets (versus a venn diagram which doesn’t really work after 3-4 sets). Have any of you tried using latest ChatGPT to get any input on creating the code using the Plotly charting library for such?

adamschroeder · November 22, 2023, 2:41pm

hi @DaveG
Yes, I’ve tried that with ChatGPT 3.5 but didn’t get desirable results. It kept creating code for different type of simple bar charts.

Perhaps with ChatGPT 4 it would work better.

Topic		Replies	Views
How to: plotly express, dashed line, facet_col, multiple y series 📊 Plotly Python question	3	5099	August 23, 2023
Why is it so hard to fins information for plotly graph objects! 📊 Plotly Python	5	663	January 8, 2024
Superimposed Scatter with Rangeslider Debugging Tick Values 📊 Plotly Python question , bug-reporter	0	317	February 23, 2023
How to show all data on graph Dash Python question	4	818	December 13, 2022
Can I do subplots, where some subplotd use graph_objects and others use plotly express ? (MIX of the 2) 📊 Plotly Python	5	936	April 29, 2022

Plotly Upset Plot

Related topics