Figure Friday 2024 - week 46

Hi,
For this week’s data set I decided to add box plots within violins to highlight medians, quartiles, and overall distribution.
For aesthetics and better visual differentation purposes I applied a pastel color palette.
Also, I Introduced jittered, semi-transparent points to reduce overlap and show individual data points more clearly.
For the y-axis values, I formatted them to two decimal places for clarity and precision, ensuring that the Max Yield (hl/ha) values are easy to interpret.
I’d love to hear your feedback on the overall design and if there’s anything else I can improve! :blush:

import plotly.express as px
import pandas as pd


df = pd.read_csv("wine_data.csv")
df['Max_yield_hl'] = pd.to_numeric(df['Max_yield_hl'], errors='coerce')
df_cleaned = df.dropna(subset=['Max_yield_hl'])


fig = px.violin(
    df_cleaned, 
    x='Color', 
    y='Max_yield_hl', 
    facet_col='Country',
    color='Color',  
    box=True,       
    title="Distribution of Max Allowed Yield (Hectoliters per Hectare) by Wine Color and Country",
    labels={"Max_yield_hl": "Max Yield (hl/ha)", "Color": "Wine Color"},
    color_discrete_sequence=px.colors.qualitative.Pastel  # Subtle color palette
)


fig.update_traces(
    jitter=0.3,         
    opacity=0.6         
)
fig.update_layout(
    title_font_size=18,
    legend_title="Wine Color",
    xaxis_title="Wine Color",
    yaxis_title="Max Yield (hl/ha)",
    yaxis=dict(tickformat=".2f")   # Format y-axis values
)

fig.show()