Ridgeline and Stacked Histogram of Multiple Categories

vis · April 26, 2022, 6:46am

Hi all!

I’m trying to create a ridgeline plot and histograms stacked line by line to visualise distribution over time of multiple categories.

I can plot the ridgeline as I’d like to as below code except the legend. How can I get the legend for Zone A and Zone B only? I tried without ‘name’ input, but then I end up another half violin plot for every single trace.

random_data = {'meeting_num':np.random.randint(1, 4, 100), 'account_age':np.round(np.random.uniform(0, 10, 100), 2), 'zone':np.random.choice(['a','b'], 100)}
random_df = pd.DataFrame(random_data)


fig = go.Figure()
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']
  
  fig.add_trace(go.Violin(x=zone_a, line_color='blue', name=f'Meeting {meeting}'))
  fig.add_trace(go.Violin(x=zone_b, line_color='orange', name=f'Meeting {meeting}'))

fig.update_traces(orientation='h', side='positive', width=2, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False, xaxis_title='Account Age', xaxis = dict(tickmode='linear', tick0=0, dtick=1), showlegend=True, width=1000, height=600, violinmode='overlay')
fig.show()

I also want to create histograms just like above but instead of the half violin plots. Trying the same way as the ridgeline, I end up with all traces overlayed on top of each other with a similar legend issue. How could I get histograms for both Zones in each meeting plotted each row?

fig = go.Figure()
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']
  
  fig.add_trace(go.Histogram(x=zone_a, histnorm='probability', name=f'Zone A, Meeting Number: {meeting}'))
  fig.add_trace(go.Histogram(x=zone_b, histnorm='probability', name=f'Zone B, Meeting Number: {meeting}'))
  
fig.update_layout(barmode='overlay', xaxis_showgrid=False, xaxis_zeroline=False, width=1000, height=600)
fig.update_traces(opacity=.5)
fig.show()

Been reading and trying a lot but couldn’t solve this. Could you please help me understand how I can achieve these?
Thank you!

jlfsjunior · April 30, 2022, 11:59am

Hi,

Starting from the bottom, you can easily do the histogram you want with plotly express:

px.histogram(
    random_df,
    x='account_age',
    color='zone',
    facet_row='meeting_num',
    histnorm="probability",
    barmode="overlay"
)

Note that there is a unique legend item for each zone, which is what you wanted.

The first question is a way trickier and require some “hacks” to get what you want. There are two reasons why this is the case:

The curves are splitted by “meeting” and overlapped per “zone” just because you are naming each trace with meeting. As you correctly noticed, they will be in “6 rows” of violins if name is None. So you can’t rename the curves, or they won’t look the way you want.
You do want the legend to be per zone though, and one single item per zone…

The code is in the bottom. Let me explain the approach:

I used subplots instead, as they give more flexibility for spacing and so one. The plots still share the same x-axis and can be zoomed together.
I used legendgroup A and B in each zone, so they will be coupled in the legend interactions. If you unselect one, you unselect all. But I am also removing the legend for each one of them, otherwise there will be 6 of them.
Then I create two empty bar plots just to add an extra legend item. They have the same color and legendgroup as the violins, but now I can name them differently without breaking the violin grouping. So I name them “Zone A/B”. The legend entry is slightly different (a solid square), however the name is right and you can control traces of the same color.

Here is the code:

random_data = {'meeting_num':np.random.randint(1, 4, 100), 'account_age':np.round(np.random.uniform(0, 10, 100), 2), 'zone':np.random.choice(['a','b'], 100)}
random_df = pd.DataFrame(random_data)

fig = make_subplots(rows=3, cols=1, shared_xaxes=True, vertical_spacing=0.03)
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']

  fig.add_trace(go.Violin(x=zone_a, line_color='blue', name=f'Meeting {meeting}', showlegend=False, legendgroup="A"), row=meeting, col=1)
  fig.add_trace(go.Violin(x=zone_b, line_color='orange', name=f'Meeting {meeting}', showlegend=False, legendgroup="B"), row=meeting, col=1)

fig.update_traces(orientation='h', side='positive', width=2, points=False)

fig.add_trace(go.Bar(x=[np.nan], y=[np.nan], legendgroup="A", marker_color='blue', name="Zone A"), row=1, col=1)
fig.add_trace(go.Bar(x=[np.nan], y=[np.nan], legendgroup="B", marker_color='orange',  name="Zone B"), row=1, col=1)

Hope this helps!

vis · June 17, 2022, 12:54am

This did help, thank you very much!

Topic		Replies	Views
Staggered/Stacked Histogram Plots 📊 Plotly Python	3	6403	April 2, 2020
Ridgeline/Joy Plots in Dash (Automatic) Dash Python	7	5899	September 3, 2018
Multiple histograms with different classes / x-axis 📊 Plotly Python	1	1948	November 18, 2023
Plotly Express: Line chart with Histogram 📊 Plotly Python	1	11319	May 8, 2020
How to do px.bar and px.area UNSTACKED plots, similar to px.line? 📊 Plotly Python	5	3635	September 13, 2021

Ridgeline and Stacked Histogram of Multiple Categories

Related topics