Ridgeline and Stacked Histogram of Multiple Categories

Hi all!

I’m trying to create a ridgeline plot and histograms stacked line by line to visualise distribution over time of multiple categories.

I can plot the ridgeline as I’d like to as below code except the legend. How can I get the legend for Zone A and Zone B only? I tried without ‘name’ input, but then I end up another half violin plot for every single trace.

random_data = {'meeting_num':np.random.randint(1, 4, 100), 'account_age':np.round(np.random.uniform(0, 10, 100), 2), 'zone':np.random.choice(['a','b'], 100)}
random_df = pd.DataFrame(random_data)


fig = go.Figure()
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']
  
  fig.add_trace(go.Violin(x=zone_a, line_color='blue', name=f'Meeting {meeting}'))
  fig.add_trace(go.Violin(x=zone_b, line_color='orange', name=f'Meeting {meeting}'))

fig.update_traces(orientation='h', side='positive', width=2, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False, xaxis_title='Account Age', xaxis = dict(tickmode='linear', tick0=0, dtick=1), showlegend=True, width=1000, height=600, violinmode='overlay')
fig.show()

I also want to create histograms just like above but instead of the half violin plots. Trying the same way as the ridgeline, I end up with all traces overlayed on top of each other with a similar legend issue. How could I get histograms for both Zones in each meeting plotted each row?

fig = go.Figure()
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']
  
  fig.add_trace(go.Histogram(x=zone_a, histnorm='probability', name=f'Zone A, Meeting Number: {meeting}'))
  fig.add_trace(go.Histogram(x=zone_b, histnorm='probability', name=f'Zone B, Meeting Number: {meeting}'))
  
fig.update_layout(barmode='overlay', xaxis_showgrid=False, xaxis_zeroline=False, width=1000, height=600)
fig.update_traces(opacity=.5)
fig.show() 

Been reading and trying a lot but couldn’t solve this. Could you please help me understand how I can achieve these?
Thank you!

Hi,

Starting from the bottom, you can easily do the histogram you want with plotly express:

px.histogram(
    random_df,
    x='account_age',
    color='zone',
    facet_row='meeting_num',
    histnorm="probability",
    barmode="overlay"
)

Note that there is a unique legend item for each zone, which is what you wanted.


The first question is a way trickier and require some “hacks” to get what you want. There are two reasons why this is the case:

  1. The curves are splitted by “meeting” and overlapped per “zone” just because you are naming each trace with meeting. As you correctly noticed, they will be in “6 rows” of violins if name is None. So you can’t rename the curves, or they won’t look the way you want.
  2. You do want the legend to be per zone though, and one single item per zone…

The code is in the bottom. Let me explain the approach:

  • I used subplots instead, as they give more flexibility for spacing and so one. The plots still share the same x-axis and can be zoomed together.
  • I used legendgroup A and B in each zone, so they will be coupled in the legend interactions. If you unselect one, you unselect all. But I am also removing the legend for each one of them, otherwise there will be 6 of them.
  • Then I create two empty bar plots just to add an extra legend item. They have the same color and legendgroup as the violins, but now I can name them differently without breaking the violin grouping. So I name them “Zone A/B”. The legend entry is slightly different (a solid square), however the name is right and you can control traces of the same color.

Here is the code:

random_data = {'meeting_num':np.random.randint(1, 4, 100), 'account_age':np.round(np.random.uniform(0, 10, 100), 2), 'zone':np.random.choice(['a','b'], 100)}
random_df = pd.DataFrame(random_data)

fig = make_subplots(rows=3, cols=1, shared_xaxes=True, vertical_spacing=0.03)
for meeting in [1,2,3]:
  zone_a = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='a')]['account_age']
  zone_b = random_df[(random_df['meeting_num']==meeting)&(random_df['zone']=='b')]['account_age']

  fig.add_trace(go.Violin(x=zone_a, line_color='blue', name=f'Meeting {meeting}', showlegend=False, legendgroup="A"), row=meeting, col=1)
  fig.add_trace(go.Violin(x=zone_b, line_color='orange', name=f'Meeting {meeting}', showlegend=False, legendgroup="B"), row=meeting, col=1)

fig.update_traces(orientation='h', side='positive', width=2, points=False)

fig.add_trace(go.Bar(x=[np.nan], y=[np.nan], legendgroup="A", marker_color='blue', name="Zone A"), row=1, col=1)
fig.add_trace(go.Bar(x=[np.nan], y=[np.nan], legendgroup="B", marker_color='orange',  name="Zone B"), row=1, col=1)

Hope this helps! :slight_smile: