Figure Friday 2025 - week 23

> join the Figure Friday session on June 13, at noon Eastern Time, to showcase your creation and receive feedback from the community. I’m travelling back from the Databricks summit on Friday. We will meet next Friday.

Several years ago, FiveThirtyEight surveyed people across the country about their steak preferences, cigarette smoking, and a few other risk-related questions.

What kind of interesting trends can you find with the steak-risk-survey dataset?

Things to consider:

  • what can you improve in the app or sample figure below (bar chart)?
  • would you like to tell a different data story using a different graph?
  • can you create a different Dash app?

Sample figure:

Code for sample figure:
from dash import Dash, dcc
import dash_ag_grid as dag
import plotly.express as px
import pandas as pd


df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-23/steak-risk-survey.csv")

# remove rows with empty values from Gender column
df = df[df['Gender'].notna()]

fig = px.bar(df, x='How do you like your steak prepared?', facet_row='Gender')

grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[{"field": i, 'filter': True, 'sortable': True} for i in df.columns],
    dashGridOptions={"pagination": True},
    columnSize="sizeToFit"
)

app = Dash()
app.layout = [
    grid,
    dcc.Graph(figure=fig)
]


if __name__ == "__main__":
    app.run(debug=False)

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to the FiveThirtyEight for the data.

2 Likes

Hello Everyone! For this Figure Friday week 23, I designed an interactive dashboard that analyzes and classifies risk behavior profiles using survey responses. This tool uses Machine Learning techniques to uncover hidden patterns and provide a deeper understanding of different risk groups within a population.


How It Works

  1. Data Preparation: The dashboard/app starts by expertly handling all the survey data, including both behavioral and demographic information. It converts all those categorical responses into a numerical format that the Machine Learning algorithms can easily understand. It also takes care of essential data cleaning, like removing incomplete responses or handling missing demographic details, ensuring the data is ready for accurate analysis.
  2. Grouping Risk Profiles (Clustering): At its core, it employs an Agglomerative Clustering model. Think of this as a smart organizer that groups survey participants into distinct “profiles” or “clusters.” Each cluster contains people who share similar risk behaviors, based on how they answered the survey questions.
  3. Visualizing the “Risk Landscape” (t-SNE): One of the most insightful features is the visualization: a scatter plot generated by t-SNE.
  • Unlike traditional charts that just show averages, t-SNE is fantastic for visualizing the conceptual similarities and differences between people, even when their original data is mostly qualitative. It plots participants in a two-dimensional “landscape,” allowing us to visually see how different risk profiles naturally group together.
  • This plot clearly shows the variability within and between clusters. Areas where many points are densely packed mean many participants have very similar behaviors, confirming a well-defined risk profile.
  • To enhance understanding, you can hover over any point on the t-SNE plot to see detailed demographic information about that participant, along with their assigned risk profile and its percentage of the total population. This lets you connect a person’s position in the “risk landscape” directly to their background, offering rich qualitative insights.
  1. Interactive Classification of New Users: The dashboard is interactive! You can answer the same survey questions, and the system will process your responses using the same models that analyzed the original data. It then classifies you into the closest existing risk profile.
  2. Clear Results Presentation: When you’re classified, the results are presented in a very clear way:
  • You’ll see an interactive profile card that details your assigned cluster’s name, a description of that risk profile, its prevalence in the general population, and a summary of your own answers compared to the profile’s typical responses.
  • Your position is also visually highlighted on the t-SNE plot, right in the center of your assigned cluster. This gives you an instant visual comparison of your profile against the rest of the population.
  1. Methodological Transparency: A dedicated section explaining the methodology behind the dashboard (like clustering and t-SNE). This helps you understand how the analysis works and builds confidence in the results.
Project Link

(PyCafe - Dash - steak_risk_survey)

Any question/comments /suggestions more than welcome


9 Likes

Hi @Avacsiglo21, this dataset was asking for an approach like this :-). Looking at the results I was wondering, does this dataset reflect the general distribution of gender/age or where there many women of around 60 who where available to answer the questions. I think it would be good if you add to your app some remarks about the general characteristics or distribution of the input data.

3 Likes

Hello everyone. Here, my simple dashboard to explore the steak risk survey dataset for the Figure Friday 2025 - week 23.

Find the code at Github

6 Likes

Hello Marianne,

Excellent Observation and Good question! As far I see the demographic distribution of the data, it’s actually quite balanced (Age, Region).
Still, it’s good practice to show these demographic characteristics for transparency. Thanks for the observation!

2 Likes

Steak Preferences Analysis

This dashboard is built with Plotly, Dash and Dash Bootstrap, providing a responsive, modern layout with a custom dark theme.
Users can interactively filter the data by age group, region, and education level, with all charts and KPIs updating in real time.
Key visualizations include a donut chart for steak preferences, a heatmap for smoking vs. drinking habits, and a grouped bar chart by region and age group.
Data is cleaned and preprocessed before visualization, and reusable components ensure consistency across the dashboard.

8 Likes

Hi,

ok another ‘simple’ analysis but it seems to completely work this time.

4 Likes

again, another beautiful dashboard! :clap:

1 Like

hi @Xavi.LL
nice color choice for the survey results choropleth.
Regarding the multi-row histogram with the title how do you like your steak prepared, where is the data on the steak preparation type? I see location and count but I can’t see the steak preparation type.

83% non smokers, wow! I thought that number would be higher.

Good idea using the controls to filter the data in the graphs by education, region, and age. I would like to check this app out on py.cafe. Are you planning to upload it there?

This visualization is pretty interesting:

It suggests that people living in the 50k-100k household income bracket tend to have more risky habits compared to other income brackets. I wonder if this is proportional to the amount of 50k-100k responders.

Hey Everyone,
Unfortunately, this week’s Figure Friday session is cancelled. I’m travelling back to NYC from the Databricks summit, so I won’t be able to be online on Friday at noon.

Hi @adamschroeder, there are two dropdown components to select the fields to plot. I uploaded the project to PyCafe so anyone who wants to can play with it.

1 Like

plotly_ff_y25w23

Here’s my submission. I wanted to explore how each demographic group responded to a particular question relative to the entire surveyed population. To visualize this, I used a percent bar chart with reference lines for context. I also added a “Transpose” checkbox to quickly switch axes. This made it easy to slice the data by either question topic or respondent characteristics.

On a side note, I built this using the latest version of the dash-mantine-components library (v2.0). Huge thanks to the contributors! It’s been an absolute delight to use and I love the new features :slight_smile:

pycafe
github

5 Likes

Have a good flight @adamschroeder!

2 Likes

Hi, @spd . We haven’t seen you in a while, welcome back :love_you_gesture:

Your app is so slick and clean: additional kudos from my end to the author of DMC, @snehilvj , and the hard-working maintainer of DMC, @AnnMarieW . You should join the DMC Discord server. They are very helpful there. And while you’re at it, if you haven’t already, join the Plotly server :grinning_face:

In your graph, would do the reference lines show us? For example in the steak preparedness bar chart below:

@mike_finko , nice app. Sharing an image here for the community to see in case they didn’t go into your app on py.cafe.

I like how you summarize the data below. One small recommendation would be to put those summaries inside cards and on the top of the page, so people don’t miss them. @Avacsiglo21 does a really good job creating summary cards. @Avacsiglo21 do you have a code template for the cards you use?

Hi Adams thanks you for your kind word, I don’t have a template code in the strict sense of the word; it depends on the context and what I want the target audience/user to notice. But as a guide, I always try to include only relevant information that adds value, context, and reference, and that isn’t repetitive. It helps describe/understand what is being seen.

1 Like

It’s remarkable how you’ve uncovered meaningful insights from a dataset that initially seems quite limited.

1 Like

Been a while since I posted! Here is a Dash app that lets you explore how smoking, gambling, and cheating relate to people’s choice of a high-risk versus low-risk lottery, plus demographics:

KPIs at the top surface overall sample size, % who smoke, % choosing the riskier bet, and average expected payout.

A faceted bar chart shows how smokers, gamblers, and cheaters each split between Lottery A and B.

The treemap breaks down respondents’ incomes by education level; the heatmap maps incomes across U.S. regions.

A correlation matrix and Sankey diagram reveal that these “risk” behaviors overlap only slightly (smoking→gambling) and otherwise act independently.

A quick summary translates the chi-square test and correlations into plain English, so you immediately know which links are significant, and which aren’t.

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from dash import Dash, dcc, html, Input, Output, callback_context
import dash_bootstrap_components as dbc
from scipy.stats import chi2_contingency

# Load and preprocess data
df = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/Figure-Friday/main/2025/week-23/steak-risk-survey.csv"
)
# Rename columns for clarity
df = df.rename(columns={
    df.columns[1]: "Lottery Choice",
    "Do you ever smoke cigarettes?": "Smokes",
    "Do you ever gamble?": "Gamble",
    "Have you ever cheated on your significant other?": "Cheated",
    "Household Income": "Income",
    "Education": "Education",
    "Location (Census Region)": "Region"
})

# Clean income categories
df["Income"] = df["Income"].fillna("Unknown").apply(
    lambda x: x if x in ["$0 - $24,999","$25,000 - $49,999","$50,000 - $99,999","$100,000 - $149,999","$150,000+"] else "Unknown"
)
# Filter out missing data
df = df.dropna(subset=["Lottery Choice","Smokes","Income","Education","Region","Cheated","Gamble"])

# Prepare correlation and chi-square helpers
corr_df = df[["Smokes","Gamble","Cheated"]].replace({"Yes":1,"No":0})
def make_contingency(data):
    ct = pd.crosstab(data["Cheated"], data["Gamble"])
    chi2, p, _, _ = chi2_contingency(ct)
    return ct, chi2, p

# Unified colors
DISCRETE_COLORS = {"Yes": "#e74c3c", "No": "#2c3e50"}
CONTINUOUS_SCALE = "Tealgrn"
CORR_SCALE = "RdBu"
SANKY_LINK_COLOR = "#bbbbbb"

# Sankey data
labels = ["Smokes: Yes","Smokes: No","Gamble: Yes","Gamble: No","Cheated: Yes","Cheated: No"]
flow1 = pd.crosstab(df["Smokes"], df["Gamble"]).reindex(index=["Yes","No"], columns=["Yes","No"]).fillna(0)
flow2 = pd.crosstab(df["Gamble"], df["Cheated"]).reindex(index=["Yes","No"], columns=["Yes","No"]).fillna(0)
source = [0,0,1,1,2,2,3,3]
target = [2,3,2,3,4,5,4,5]
value  = [flow1.loc["Yes","Yes"], flow1.loc["Yes","No"], flow1.loc["No","Yes"], flow1.loc["No","No"],
          flow2.loc["Yes","Yes"], flow2.loc["Yes","No"], flow2.loc["No","Yes"], flow2.loc["No","No"]]
fig_sankey = go.Figure(go.Sankey(
    node=dict(label=labels, pad=15, thickness=20,
              color=[DISCRETE_COLORS["Yes"], DISCRETE_COLORS["No"], DISCRETE_COLORS["Yes"], DISCRETE_COLORS["No"], DISCRETE_COLORS["Yes"], DISCRETE_COLORS["No"]]),
    link=dict(source=source, target=target, value=value, color=SANKY_LINK_COLOR)
))
fig_sankey.update_layout(title_text="Behavior Flow: Smoking → Gambling → Cheating", font_size=12)

# Dash App
theme = dbc.themes.LUX
app = Dash(__name__, external_stylesheets=[theme], suppress_callback_exceptions=True)
app.title = "Survey Insights"
CARD_STYLE = {"backgroundColor": "#ffffff", "padding": "15px", "marginBottom": "20px", "boxShadow": "0 2px 6px rgba(0,0,0,0.1)"}

# Layout
def serve_layout():
    return dbc.Container(fluid=True, children=[
        html.H1("Survey Insights", className="text-center mt-3 mb-4"),
        # KPI cards
        dbc.Row([
            dbc.Col(dbc.Card([html.H5("Total Responses"), html.H2(id='kpi-total')], style=CARD_STYLE), width=3),
            dbc.Col(dbc.Card([html.H5("% Smokers"), html.H2(id='kpi-smoke')], style=CARD_STYLE), width=3),
            dbc.Col(dbc.Card([html.H5("% Choose Lottery A"), html.H2(id='kpi-bet')], style=CARD_STYLE), width=3),
            dbc.Col(dbc.Card([html.H5("Avg Expected Value"), html.H2(id='kpi-ev')], style=CARD_STYLE), width=3)
        ], className='mb-4'),
        # Filters
        dbc.Row([
            dbc.Col(dbc.Card([
                html.H5("Filters"),
                dcc.Dropdown(id="region-filter", options=[{"label":r,"value":r} for r in sorted(df.Region.unique())], placeholder="Select region...", clearable=True),
                dcc.Dropdown(id="income-filter", options=[{"label":i,"value":i} for i in sorted(df.Income.unique())], placeholder="Select income...", clearable=True, style={"marginTop":"10px"}),
                html.Button("Reset Filters", id='reset-filters', n_clicks=0, className="mt-2 btn btn-secondary w-100")
            ], style=CARD_STYLE), width=3)
        ], className='mb-4'),
        # Graphs & Summary placeholder
        html.Div(
        children=[
            dcc.Graph(id='lottery-bar', figure={}),
            dcc.Graph(id='income-education-treemap', figure={})
        ],
        id='graphs-container'
    ),
        # Quick Summary
        dbc.Card([
            html.H5("Quick Statistical Summary"),
            html.Div(id='summary-stats', style={"padding":"10px"})
        ], style=CARD_STYLE)
    ])

app.layout = serve_layout

# Callback
@app.callback(
    [Output('graphs-container','children'),
     Output('kpi-total','children'), Output('kpi-smoke','children'),
     Output('kpi-bet','children'), Output('kpi-ev','children'),
     Output('summary-stats','children')],
    [Input('region-filter','value'), Input('income-filter','value'), Input('reset-filters','n_clicks'),
     Input('income-education-treemap','clickData'), Input('lottery-bar','clickData')]
)
def update_dashboard(region, income, reset, treemap_click, bar_click):
    ctx = callback_context.triggered
    # Reset filters
    if ctx and ctx[0]['prop_id'] == 'reset-filters.n_clicks':
        region = income = None
    # Filtered DF
    df_f = df.copy()
    if region: df_f = df_f[df_f.Region==region]
    if income: df_f = df_f[df_f.Income==income]
    # Drill-down
    if treemap_click:
        lbl = treemap_click['points'][0]['label']
        df_f = df[df.Education==lbl] if lbl in df.Education.values else df[df.Income==lbl]
    if bar_click:
        choice = bar_click['points'][0]['x']
        df_f = df_f[df_f['Lottery Choice']==choice]
    # KPIs
    tot = len(df_f)
    p_sm = f"{df_f.Smokes.eq('Yes').mean()*100:.1f}%"
    p_bet = f"{df_f['Lottery Choice'].eq('Lottery A').mean()*100:.1f}%"
    a_ev = f"${df_f['Lottery Choice'].map({'Lottery A':50,'Lottery B':18}).mean():.2f}"
    # Charts
    behaviors = ['Smokes','Gamble','Cheated']
    df_long = df_f.melt(id_vars=['Lottery Choice'], value_vars=behaviors, var_name='Behavior', value_name='Response')
    grouped = df_long.groupby(['Behavior','Response','Lottery Choice']).size().reset_index(name='Count')
    fig1 = px.bar(grouped, x='Lottery Choice', y='Count', color='Response', barmode='group', facet_col='Behavior',
                  category_orders={'Behavior':behaviors,'Response':['Yes','No']}, color_discrete_map=DISCRETE_COLORS,
                  title='Lottery Choice by Risk Behaviors')
    fig1.update_traces(texttemplate='%{y}', textposition='outside')
    fig1.update_layout(margin=dict(t=40,l=20,r=20,b=20), yaxis_title='Number of Responses')
    fig2 = px.treemap(df_f.groupby(['Education','Income']).size().reset_index(name='Count'), path=['Education','Income'], values='Count',
                      color='Count', color_continuous_scale=CONTINUOUS_SCALE, title='Income Distribution by Education')
    fig2.update_layout(height=500, margin=dict(t=40,l=20,r=20,b=20))
    fig3 = px.density_heatmap(df_f, x='Region', y='Income', color_continuous_scale=CONTINUOUS_SCALE,
                              title='Heatmap of Income by Region')
    fig4 = px.imshow(corr_df.corr(), text_auto=True, title='Correlation Matrix', color_continuous_scale=CORR_SCALE)
    fig4.update_traces(zmid=0)
    fig5 = fig_sankey
    rows = [
        dbc.Row(dbc.Col(dcc.Graph(id='lottery-bar', figure=fig1), width=12), className='mb-4'),
        dbc.Row([dbc.Col(dcc.Graph(id='income-education-treemap', figure=fig2), width=6), dbc.Col(dcc.Graph(figure=fig3), width=6)], className='mb-4'),
        dbc.Row([dbc.Col(dcc.Graph(figure=fig4), width=6), dbc.Col(dcc.Graph(figure=fig5), width=6)], className='mb-4')
    ]
    # Statistical summary
    ct, chi2, p = make_contingency(df_f)
    corr_vals = corr_df.corr().to_dict()
    summary = html.Ul([
        html.Li(f"Chi-square test for association between Cheating and Gambling: p = {p:.3f}. {'Significant' if p < 0.05 else 'Not significant'}"),
        html.Li(f"Small positive correlation between Smoking and Gambling (r = {corr_vals['Smokes']['Gamble']:.2f}), indicating smokers are slightly more likely to gamble."),
        html.Li(f"Near-zero correlation between Smoking and Cheating (r = {corr_vals['Smokes']['Cheated']:.2f}), suggesting no linear relationship."),
        html.Li(f"Near-zero correlation between Gambling and Cheating (r = {corr_vals['Gamble']['Cheated']:.2f}), suggesting these behaviors are independent.")
    ], style={"marginLeft": "20px"})
    return rows, tot, p_sm, p_bet, a_ev, summary
if __name__ == '__main__':
    app.run(debug=True)


4 Likes