Figure Friday 2025 - week 35

adamschroeder · August 29, 2025, 1:30pm

join the Figure Friday session on September 5, at noon Eastern Time, to showcase your creation and receive feedback from the community.

This is the largest Figure Friday dataset we have had so far - in terms of number of columns. One way to tackle this dataset is to review the columns and focus on a handful of columns that interest you.

What AI tools do Computer Science learners use the most?

Answer this question and a few others by using Plotly on the survey conducted by Cornell University.

More information on the dataset and research paper.

Things to consider:

what can you improve in the app or sample figure below (subplots)?
would you like to tell a different data story using a different graph?
how can you explore the data with Plotly Studio?

Sample figure:
Thank you to @Avacsiglo21 for the sample code and image.

Code for sample figure:

import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Load the CSV file with the `low_memory=False` 
# https://drive.google.com/file/d/1gBG4jG-h9A123-GFQA7GgIgExBzoCd8w/view?usp=sharing
df = pd.read_csv("dataset.csv", low_memory=False)

# --- 1. First, analyze the general AI usage ---
ai_usage_col = '[61] Do you use AI in your everyday life?'
ai_usage_counts = df[ai_usage_col].value_counts()

# --- 2. Then, filter the DataFrame to only include participants who use AI ---
ai_users_df = df[df[ai_usage_col] == 'Yes']
total_ai_users = len(ai_users_df)

# --- 3. Analyze the usage of AI tools specifically among AI users ---
ai_tool_cols = [col for col in df.columns if '[62] Which AI-based assistant do you use?' in col]
tool_usage = {}

# Iterate over each AI tool column.
for col in ai_tool_cols:
    # Extract the tool name from the column header.
    tool_name = col.split(': ')[-1]
    # Count how many AI users selected this tool.
    count = ai_users_df[col].notna().sum()
    if count > 0:
        tool_usage[tool_name] = count

# Sort the tools by popularity (descending) and get the top 10.
sorted_tools = sorted(tool_usage.items(), key=lambda item: item[1], reverse=True)
top_10_tools = dict(sorted_tools[:10])

# --- 4. Calculate the percentage of users for each tool ---
tool_percentages = {tool: (count / total_ai_users) * 100 for tool, count in top_10_tools.items()}

# --- 5. Create the plots ---
# Use make_subplots to combine the pie chart and bar chart.
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('General AI Usage', 'Top 10 AI Assistants (% of AI Users-Multiple Selection Possible)'),
    specs=[[{"type": "pie"}, {"type": "bar"}]],
    vertical_spacing=0.5,

)

# Add the Pie Chart for general AI usage.
fig.add_trace(
    go.Pie(
        labels=ai_usage_counts.index,
        values=ai_usage_counts.values,
        hole=0.65,
        name="AI Usage",
        marker_colors=['#4E79A7', '#F28E2B', '#E15759']
    ),
    row=1, col=1
)

# Add the Horizontal Bar Chart for the top 10 tools, showing percentages.
fig.add_trace(
    go.Bar(
        x=list(tool_percentages.values()),
        y=list(tool_percentages.keys()),
        orientation='h',
        name="Tools",
        marker_color='#4E79A7',
        text=[f'{p:.1f}%' for p in tool_percentages.values()],
        textposition='outside',
        textfont=dict(color='black', size=10)
    ),
    row=1, col=2
)

# Customize the overall layout.
fig.update_layout(
    title_text="AI Usage Analysis in Computer Science Learners",
    title_x=0.5,
    title_font_size=24,bargap=0.1,
    height=500,
    template='plotly_white',
    showlegend=False,
    font=dict(size=12)
)


fig.update_xaxes(title_text="", row=1, col=2, showgrid=False, visible=False)
fig.update_yaxes(title_text="", row=1, col=2, categoryorder='total ascending')


fig.update_annotations(font_size=16, y=1.02)
fig.show()

For community members that would like to build the data app with Plotly Studio, but don’t have the application yet, simply go to Plotly.com/studio. Please keep in mind that Plotly Studio is still in early access.

Below is a screenshot of a treemap built by Plotly Studio on top of this dataset:

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to Cornell University for the data.

adamschroeder · August 29, 2025, 2:42pm

3 posts were split to a new topic: Do you have any sample AI projects

marieanne · August 30, 2025, 5:50am

Studio does exploratory analysis : https://a2d7edc8-5896-405d-94e5-6178671a839e.plotly.app/

I asked “Give me the 10 main key insights for this dataset and the 10 accompanying charts“. I personally only understand this chart “Most Challenging Aspects of Studying Computer Science“ immediately (besides the geo one).

Maybe someone gets some inspiration from it.

mike_finko · August 30, 2025, 10:46am

@marieanne so your saying I CAN be lazy and skip the EDA ?

marieanne · August 30, 2025, 1:47pm

That’s the 1.000.000 question.

adamschroeder · August 31, 2025, 1:42pm

@marieanne it was also interesting for me to see this breakdown of career motivation by gender. I didn’t expect to see the % of male respondents – that were motivated to pursue a CS career by chance – to be so low.

marieanne · August 31, 2025, 2:14pm

I must admit my english is not good enough to get the nuances of some numbers going on. But, after throwing this set again through PS (highly recommended for fast exploring) to get some numbers of my fellow peeps ( 60 and plus), and reading the top paragraph of this page, [2508.05286] Everything You Need to Know About CS Education: Open Results from a Survey of More Than 18,000 Participants , I tend to think that this survey was returned by learners who are highly motivated intrinsically.

But maybe I’m completely wrong.

Edit: This was what I was actually looking for, chapter 3.2 about data collection., targetting:

https://arxiv.org/pdf/2508.05286

Mike_Purtell · September 4, 2025, 4:45am

This dashboard groups all data by gender. Top row left has a world choropleth map with selected country highlighted. Top row right is a histogram of user counts, binned by AGE_RANGE, EDUCATION, or YEARS_EXPERIENCE. I experimented with Plotly Studio for ideas, but have developed this mostly from scratch.

Bottom row left is a pareto chart showing the top 10 of CS_LANG, AI_ASST, or AI_FEATURE. I made a design decision to place the highest numbered category on the bottom instead of the top to steer clear of the legend. The bottom row right is a barbell chart made with px.line, for direct companion of Male vs Famale responses.

Dataset has 608 columns, but many are part of a multi-column ‘dummies’ format groups, which uses separate columns for each unique answer to the same question. The dummies format is also considered to be a sparse dataframe, and is why the dataset is so wide

Polars concat_list was used to gather the CS_LANG, AI_ASST, and AI_FEATURE values into 3 columns with datatype of list. Polars support for list and array types is far more extensive and efficient than the pandas equivalent. I use the number of answers from each user by category to weight the values. For example, when users list 4 programming languages, each one is given a weight of ¼. An index is added to give each user a unique ID, before exploding the dataframe to separate the list values into separate rows, similar to a melt or unpivot.

The cleaned dataset is saved as a parquet file, with all datatypes optimized for storage size. String categories are cast as Categorical, float columns are cast as Float32, Integer columns cast as UInt8. The app checks for the parquet file, and it is used as the global dataframe. If not found the app reads the csv file, cleans it, and saves the data frame as parquet for future runs. This allows me to upload a much smaller parquet file to Plotly Cloud and avoids having to read the csv file and clean it every time.

I hope you enjoy this dashboard and appreciate any feedback or suggestions.

Here is a link to Plotly Cloud hosted dashboard:

Here are a few screenshots:

Code is on GitHub; here is the link:
Plotly_FF_2025/Week_35_AI_Assistant_Cornell_Survey at main · Mike-Purtell/Plotly_FF_2025

marieanne · September 4, 2025, 8:08am

Great clean app layout. Although I’’m not going to submit something I have one remark.

The title of the app contains the word adoption. When you glance at the bell chart (that’s what most people do) it’s very easy to think “those poor women“. What one might easily miss: most survey submissions were by men.

If you would show percentages (% of female left, % of male right) it might look the same but maybe not.

adamschroeder · September 4, 2025, 11:57am

@Mike_Purtell I like how you save your data as parquet files. How much smaller is a parquet file than a CSV sheet (assuming one uses the same dataset with the same amount of rows and columns)?

Good idea including the map under the dropdown. That way, people can see geographically what country they chose. Maybe you can have a slightly different map style with clear country borders, and then allow users to click on the map to choose a country as well, not only via the dropdown.

Avacsiglo21 · September 4, 2025, 1:53pm

Hi Everyone, here’s my approach for this week. This time it’s an Analytics Dashboard since I didn’t use any charts. Instead, I used a card system with filters to explore associations/patterns among survey respondents. The analysis uses Chi-square tests and Cramer’s V to identify statistically significant relationships between student characteristics. Important to highlight that this is a small dataset of 9,538 respondents, which represents the students who answered ‘Yes’ they use AI.

CS Student Survey Explorer -

What it is:
A data analysis tool that explores survey responses from 9,538 Computer Science students worldwide, collected by Cornell University.

What it does:

Helps you discover patterns in student behavior and preferences
Shows which types of students are most likely to have certain characteristics
Identifies the strongest connections between different student traits

How it works:

Select a characteristic you want to explore (like “Uses AI tools”, “From Brazil”, or “Earns $50K-$99K”)
Choose a specific value for that characteristic
View the results showing which student groups are most likely to have that trait

Key insights you’ll get:

Concentration rates: What percentage of students in each group show your selected trait
Comparison to average: How much more (or less) likely each group is compared to the overall average
Statistical significance: Which patterns are strong enough to be meaningful (not just random)

Perfect for:

Educators: Understanding student populations and designing curricula
Researchers: Identifying significant patterns in CS education
Administrators: Making data-driven decisions about programs and resources
Anyone curious: About what makes different CS student groups unique

Example insight: “Students who use AI coding assistants are 2.3x more likely to be from certain countries and 1.8x more likely to have specific salary expectations.”

The app translates complex statistical analysis into clear, actionable insights that anyone can understand and use.

By the way as usual is an on going project for instance the Export Report is there but is not fully working, the Refresh button it´s working

The code

import pandas as pd
import dash
from dash import dcc, html
from dash.dependencies import Input, Output, State
from scipy.stats import chi2_contingency
import numpy as np
import dash_bootstrap_components as dbc
from datetime import datetime

— Data Loading —

try:
df = pd.read_csv(‘ai_user_survey.csv’)
print(“ Success: ‘ai_user_survey.csv’ loaded successfully.”)
except FileNotFoundError:
print(“ Error: ‘ai_user_survey.csv’ not found. Please ensure the file is in the same directory.”)
df = pd.DataFrame()

— Data Preprocessing —

categorical_cols = [
‘Status’, ‘Country’, ‘Gender’, ‘Age_Range’, ‘Marital_Status’, ‘Children’,
‘Born_in_Current_Country’, ‘Employment_Status’, ‘Worked_in_Other_Field’,
‘Previous_Field’, ‘Paid_CS_Experience’, ‘CS_is_Primary_Income’,
‘Primary_Income_Source’, ‘Years_of_Coding_Experience’, ‘Annual_Salary_USD’,
‘Monthly_Online_Education_Spending’, ‘Willingness_to_Pay_for_Education’,
‘Willingness_to_Pay_Features’, ‘IDE_Learning_Experience’,
‘Dev_Environment_Setup_Experience’, ‘Learning_Pace’,
‘Studying_Device_Provider’, ‘AI_Assistants’, ‘AI_Features’, ‘Job_Roles’
]
categorical_cols_to_use = [col for col in categorical_cols if col in df.columns]

— Helper Functions —

def translate_metric_to_business(value, metric_type):
“”“Translate statistical metrics to business language”“”
if metric_type == “concentration”:
if value >= 80:
return f"{int(value//10)} out of 10", “Extremely High”
elif value >= 60:
return f"{int(value//10)} out of 10", “High”
elif value >= 40:
return f"{int(value//10)} out of 10", “Moderate”
else:
return f"{int(value//10)} out of 10", “Low”

elif metric_type == "lift":
    if value >= 3:
        return f"{value:.0f}x more likely", "Very Strong"
    elif value >= 2:
        return f"{value:.1f}x more likely", "Strong"
    elif value >= 1.5:
        return f"{value:.1f}x more likely", "Moderate"
    else:
        return f"{value:.1f}x more likely", "Weak"

elif metric_type == "cramers":
    if value >= 0.3:
        return "Very Strong Association", "🔥"
    elif value >= 0.2:
        return "Strong Association", "💪"
    else:
        return "Moderate Association", "👍"

return str(value), ""

def generate_insight(factor, category, concentration, lift, market_share):
“”“Generate Survey from statistical patterns”“”
insights =

# Primary insight
conc_text, _ = translate_metric_to_business(concentration, "concentration")
lift_text, _ = translate_metric_to_business(lift, "lift")

primary = f"💡 **Key Finding**: {conc_text} of '{category}' students fall into this selected group"
insights.append(primary)

# Market opportunity
if market_share >= 20:
    insights.append(f"🎯 **High Representation**: This segment represents {market_share:.0f}% of the study participants")
elif market_share >= 10:
    insights.append(f"📊 **Notable Representation**: This segment represents {market_share:.0f}% of the study participants")

# Business recommendation
factor_clean = factor.replace('_', ' ').title()
if lift >= 2.5:
    insights.append(f"🚀 **Strong Pattern**: Priority area for study {factor_clean} - {lift_text}  significantly higher than baseline")
elif lift >= 1.5:
    insights.append(f"📈 **Notable Pattern**: Significant finding {factor_clean} - {lift_text} higher than baseline")

return insights

— App Setup —

app = dash.Dash(name, external_stylesheets=[dbc.themes.FLATLY])
app.title = “CS Student Survey Explorer”

— Layout —

app.layout = dbc.Container(fluid=True, className=“px-4 py-3”, children=[
# Survey Context Header
dbc.Card([
dbc.CardBody([
dbc.Row([
dbc.Col([
html.H1(“CS Student Survey Explorer”, className=“display-6 fw-bold text-primary mb-1”),
html.P(“Understanding Computer Science Student Patterns & Preferences from a Survey Conducted by Cornell University”, className=“text-muted mb-2”),
html.Div([
dbc.Badge(“9,538 Students”, color=“info”, className=“me-2”),
dbc.Badge(“Global Survey”, color=“success”, className=“me-2”),
dbc.Badge(“27 Variables”, color=“warning”, className=“me-2”),
dbc.Badge(f"Updated {datetime.now().strftime(‘%Y-%m-%d’)}", color=“secondary”)
])
], width=8),
dbc.Col([
html.Div([
dbc.Button(“ Export Report”, id=“export-btn”, color=“primary”, size=“sm”, className=“me-2”),
dbc.Button(“ Refresh”, id=“refresh-btn”, color=“outline-secondary”, size=“sm”)
], className=“d-flex justify-content-end align-items-center”)
], width=4)
])
])
], className=“mb-4”),

# What This Analysis Shows
dbc.Card([
    dbc.CardHeader([
        html.H5("📖 What This Analysis Reveals", className="mb-0 fw-bold")
    ]),
    dbc.CardBody([
        html.P([
            "This dashboard identifies the strongest associations in student behavior and preferences. ",
            "Select any characteristic (e.g., 'AI usage', 'Salary range', 'Country') and discover which student profiles are most likely to exhibit that trait. ",
            html.Strong("Ideal for curriculum development, student recruitment, and program planning.")
        ], className="mb-2"),
        dbc.Alert([
            html.I(className="bi bi-lightbulb me-2"),
            "Pro Tip: Look for segments with high 'concentration' (many students in that group show the trait) and high 'association strength' (much more likely than average)."
        ], color="info", className="mb-0")
    ])
], className="mb-4"),

# Control Panel
dbc.Card([
    dbc.CardHeader([
        html.H5("🎯 Analysis Configuration", className="mb-0 fw-bold")
    ]),
    dbc.CardBody([
        dbc.Row([
            dbc.Col([
                html.Label("What student characteristic are you exploring?", className="fw-semibold"),
                dcc.Dropdown(
                    id='result-variable-dropdown',
                    options=[{'label': i.replace('_', ' ').title(), 'value': i} for i in categorical_cols_to_use],
                    placeholder="Select variable to analyze (e.g., AI_Assistants, Country, Annual_Salary_USD)..."
                ),
            ], md=6),
            dbc.Col([
                html.Label("Which specific value interests you?", className="fw-semibold"),
                dcc.Dropdown(
                    id='result-value-dropdown',
                    placeholder="Select specific value (e.g., 'Yes', 'Brazil', '$50,000-$99,999')..."
                ),
            ], md=6)
        ])
    ])
], className="mb-4"),

# KPIs Dashboard
html.Div(id='kpi-dashboard'),

# Key Factors Section
html.Div([
    dbc.Row([
        dbc.Col([
            html.H4("🔑 Strongest Associations Found", className="mb-0 text-primary")
        ], width=8),
        dbc.Col([
            html.Small("Showing strongest associations only (statistical significance required)", 
                      className="text-muted text-end")
        ], width=4)
    ], className="mb-3")
]),

dcc.Loading(
    id="loading-1",
    type="default",
    children=html.Div(id='summary-cards')
),

# Export Modal
dbc.Modal([
    dbc.ModalHeader("📄 Executive Report Generated"),
    dbc.ModalBody([
        html.P("Your survey analysis has been processed. In a full implementation, this would generate:"),
        html.Ul([
            html.Li("Comprehensive survey analysis report with key correlations"),
            html.Li("Statistical methodology and significance testing details"),
            html.Li("Research findings and notable patterns identified"),
            html.Li("Exportable data tables and correlation matrices for further study")
        ])
    ]),
    dbc.ModalFooter(dbc.Button("Close", id="close-modal", color="secondary"))
], id="export-modal", centered=True)

])

— Callbacks —

@app.callback(
Output(‘result-value-dropdown’, ‘options’),
[Input(‘result-variable-dropdown’, ‘value’)]
)
def set_value_options(selected_variable):
if not selected_variable or df.empty:
return
if selected_variable in df.columns:
unique_values = df[selected_variable].dropna().unique()
return [{‘label’: i, ‘value’: i} for i in unique_values]
return

@app.callback(
Output(‘kpi-dashboard’, ‘children’),
[Input(‘result-variable-dropdown’, ‘value’),
Input(‘result-value-dropdown’, ‘value’)]
)
def update_kpis(selected_variable, selected_value):
if not selected_variable or not selected_value or df.empty:
return html.Div()

# Calculate basic KPIs
target_column_name = f'is_{selected_value}'
df.loc[:, target_column_name] = df[selected_variable].apply(lambda x: 1 if x == selected_value else 0)

total_sample = len(df)
target_count = df[target_column_name].sum()
baseline_rate = (target_count / total_sample * 100) if total_sample > 0 else 0

# Quick count of significant factors
significant_factors = 0
analysis_vars = [col for col in categorical_cols_to_use if col != selected_variable]

for factor_var in analysis_vars:  # Check first 15 for quick KPIs
    try:
        contingency_table = pd.crosstab(df[factor_var].dropna(), df[target_column_name].dropna())
        if not contingency_table.empty and contingency_table.shape[0] > 1:
            contingency_table_smoothed = contingency_table + 1
            chi2, p_value, dof, expected = chi2_contingency(contingency_table_smoothed)
            n = contingency_table_smoothed.sum().sum()
            phi2 = chi2 / n
            k, r = contingency_table_smoothed.shape
            cramers_v = np.sqrt(phi2 / min(k-1, r-1))
            if cramers_v > 0.15 and p_value < 0.05:
                significant_factors += 1
    except:
        continue

return dbc.Row([
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H2(f"{target_count:,}", className="text-primary mb-1"),
                html.P("Respondents in Selected Group", className="text-muted mb-0"),
                html.Small(f"out of {total_sample:,} total respondents", className="text-secondary")
            ])
        ], className="text-center border-primary")
    ], md=3),
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H2(f"{baseline_rate:.1f}%", className="text-info mb-1"),
                html.P("Survey Prevalence", className="text-muted mb-0"),
                html.Small("baseline rate in population", className="text-secondary")
            ])
        ], className="text-center border-info")
    ], md=3),
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H2(f"{significant_factors}", className="text-success mb-1"),
                html.P("Associated Factors", className="text-muted mb-0"),
                html.Small("statistically significant correlations", className="text-secondary")
            ])
        ], className="text-center border-success")
    ], md=3),
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H2("📊", className="text-warning mb-1"),
                html.P("Analysis Status", className="text-muted mb-0"),
                html.Small("ready for insights", className="text-secondary")
            ])
        ], className="text-center border-warning")
    ], md=3)
], className="mb-4")

@app.callback(
Output(‘summary-cards’, ‘children’),
[Input(‘result-variable-dropdown’, ‘value’),
Input(‘result-value-dropdown’, ‘value’)]
)
def update_summary_cards(selected_variable, selected_value):
if not selected_variable or not selected_value or df.empty:
return [
dbc.Alert([
html.H5(“ Get Started”, className=“alert-heading”),
html.P(“Select a variable and specific value above to discover the strongest associations factors in the CS student survey.”, className=“mb-0”)
], color=“light”, className=“text-center”)
]

target_column_name = f'is_{selected_value}'
df.loc[:, target_column_name] = df[selected_variable].apply(lambda x: 1 if x == selected_value else 0)

factor_scores = []
analysis_vars = [col for col in categorical_cols_to_use if col != selected_variable]

for factor_var in analysis_vars:
    try:
        contingency_table = pd.crosstab(df[factor_var].dropna(), df[target_column_name].dropna())
        contingency_table_smoothed = contingency_table + 1
        
        if not contingency_table_smoothed.empty and contingency_table_smoothed.shape[0] > 1 and contingency_table_smoothed.shape[1] > 1:
            chi2, p_value, dof, expected = chi2_contingency(contingency_table_smoothed)
            n = contingency_table_smoothed.sum().sum()
            phi2 = chi2 / n
            k, r = contingency_table_smoothed.shape
            cramers_v = np.sqrt(phi2 / min(k-1, r-1))
            
            # Only include strong associations
            if cramers_v > 0.15 and p_value < 0.05:
                factor_scores.append({'factor': factor_var, 'cramers_v': cramers_v, 'p_value': p_value})
    except ValueError:
        continue

if not factor_scores:
    return [
        dbc.Alert([
            html.H5("🔍 No Strong Patterns Found", className="alert-heading"),
            html.P([
                "No statistically significant factors were identified for this combination. ",
                "This could mean the trait is evenly distributed across all student profiles, or you may want to try a different variable/value combination."
            ], className="mb-0")
        ], color="info", className="text-center")
    ]

factor_df = pd.DataFrame(factor_scores).sort_values(by='cramers_v', ascending=False)

cards = []
for _, row in factor_df.head(5).iterrows():
    top_factor = row['factor']

    contingency_top_factor = pd.crosstab(df[top_factor].dropna(), df[selected_variable].dropna())
    contingency_top_factor_smoothed = contingency_top_factor + 1
    chi2, p_value, dof, expected = chi2_contingency(contingency_top_factor_smoothed)
    standardized_residuals = (contingency_top_factor - expected) / np.sqrt(expected)
    
    if selected_value in standardized_residuals.columns:
        top_pos_residual = standardized_residuals[selected_value].idxmax()
        
        # Business metrics
        total_students = len(df.index)
        category_total = df[df[top_factor] == top_pos_residual].shape[0]
        num_students_in_category = df[(df[top_factor] == top_pos_residual) & (df[selected_variable] == selected_value)].shape[0]
        
        percentage = (num_students_in_category / total_students) * 100 if total_students > 0 else 0
        concentration_rate = (num_students_in_category / category_total) * 100 if category_total > 0 else 0
        baseline_rate = (df[target_column_name].sum() / total_students) * 100 if total_students > 0 else 0
        lift = concentration_rate / baseline_rate if baseline_rate > 0 else 0
        market_share = (num_students_in_category / df[target_column_name].sum()) * 100 if df[target_column_name].sum() > 0 else 0
        
        # Business translations
        conc_readable, conc_strength = translate_metric_to_business(concentration_rate, "concentration")
        lift_readable, lift_strength = translate_metric_to_business(lift, "lift")
        assoc_readable, assoc_icon = translate_metric_to_business(row['cramers_v'], "cramers")
        
        # Generate insights
        insights = generate_insight(top_factor, top_pos_residual, concentration_rate, lift, market_share)
        
        # Determine card styling
        if row['cramers_v'] >= 0.3:
            border_color = "border-danger"
            header_color = "bg-danger text-white"
        elif row['cramers_v'] >= 0.2:
            border_color = "border-warning"  
            header_color = "bg-warning text-dark"
        else:
            border_color = "border-info"
            header_color = "bg-info text-white"
        
        # Enhanced executive card
        card_content = [
            dbc.CardHeader([
                html.Div([
                    html.H5(f"{top_factor.replace('_', ' ').title()}", className="mb-0"),
                    html.Span(assoc_icon, style={"fontSize": "1.2rem"})
                ], className="d-flex justify-content-between align-items-center")
            ], className=header_color),
            
            dbc.CardBody([
                # Key segment
                html.Div([
                    html.H6("🎯 Strongest in:", className="text-muted mb-1"),
                    html.H5(f'"{top_pos_residual}"', className="text-primary mb-3")
                ]),
                
                # Business metrics
                dbc.Row([
                    dbc.Col([
                        html.H4(conc_readable, className="text-success mb-1"),
                        html.P("in this segment", className="small text-muted mb-0")
                    ], width=6, className="text-center"),
                    dbc.Col([
                        html.H4(lift_readable, className="text-info mb-1"), 
                        html.P("vs average", className="small text-muted mb-0")
                    ], width=6, className="text-center")
                ], className="mb-3"),
                
                html.Hr(className="my-3"),
                
                # Business insights
                html.Div([
                    html.H6("💡 Survey Insights:", className="text-primary mb-2"),
                    html.Div([
                        html.P(insight, className="small mb-1") for insight in insights
                    ])
                ], className="mb-3"),
                
                # Technical details (collapsible)
                html.Details([
                    html.Summary("📊 Statistical Details", className="text-secondary small mb-2"),
                    html.Div([
                        html.Small(f"Group Representation: {market_share:.1f}% of selected group", className="text-secondary d-block"),
                        html.Small(f"Sample Size: {category_total:,} students in this group", className="text-secondary d-block"),
                        html.Small(f"Statistical Strength: {assoc_readable}", className="text-secondary d-block"),
                        html.Small(f"P-value: {row['p_value']:.4f} (highly significant)", className="text-secondary d-block")
                    ])
                ])
            ])
        ]
        
        cards.append(
            dbc.Col([
                dbc.Card(card_content, className=f"{border_color} shadow-sm h-100")
            ], lg=4, md=6, className="mb-3")
        )

return dbc.Row(cards)

@app.callback(
Output(“export-modal”, “is_open”),
[Input(“export-btn”, “n_clicks”), Input(“close-modal”, “n_clicks”)],
[State(“export-modal”, “is_open”)]
)
def toggle_modal(export_clicks, close_clicks, is_open):
if export_clicks or close_clicks:
return not is_open
return is_open

@app.callback(
[Output(‘result-variable-dropdown’, ‘value’),
Output(‘result-value-dropdown’, ‘value’)],
[Input(‘refresh-btn’, ‘n_clicks’)]
)
def refresh_dashboard(n_clicks):
if n_clicks:
return None, None # Reset both dropdowns
return dash.no_update, dash.no_update

server = app.server

The app I´m trying to set public, but I have not been allowed I will try and let you know

nathandrezner · September 4, 2025, 2:14pm

Feel free to send me an email about making your app public (nathan@plot.ly) Let me know what issues you’re running into making the app public, happy to help out!

Avacsiglo21 · September 4, 2025, 2:26pm

Hi Nathan thanks a lot, next time I will do that, just issue solved I do not know is because I have other 2 apps running 3 with this, so I deleted One.

Once Again Thanks a lot,

Avacsiglo21 · September 4, 2025, 2:29pm

links Updated

Mike_Purtell · September 4, 2025, 6:40pm

Hello @adamschroeder,

The given dataset csv file from Cornell is 75.3M.

When I read it with polars and save as parquet, without any changes or filtering at all, the size is reduced to 1.9M. Pretty nice reduction, almost 40x. After the filtering, cleaning and column selection the file size is reduced to 845K which is what I uploaded to Plotly Cloud. I would expect the same file sizes if I had used pandas.

Mike_Purtell · September 4, 2025, 6:45pm

Thank you @marieanne . I can see how the title may be a bit confusing or misleading. I always struggle with naming things well as much as I struggle with getting code to work. On the barbell chart using percentages instead of values should have the same shape. I may explore that a bit more. I missed out last week, and perhaps as a result I really enjoyed making this week’s dashboard.

Mike_Purtell · September 5, 2025, 5:02am

Hi @marieanne , I misunderstood your suggestion but as I thought about it over the past 1/2 day I realize just how good your idea is. I wish I had more time to do it, maybe over the weekend, but thank you so much for your wonderful guidance.

mike_finko · September 5, 2025, 9:39am

Hi,

unfortunately, I have to skip this week, working on a ‘business case’ analysis for a possible job. I was really looking forward to this one as I’ve never done an analysis with so many columns (common in marketing?).

marieanne · September 5, 2025, 10:34am

Good luck!

Ester · September 5, 2025, 1:36pm

I created this dashboard in Plotly Studio. I shortened the dataset to about 5-6 columns and created it based on that. I only modified the colors mainly.

Topic		Replies	Views
Figure Friday 2024 - week 31 Dash Python figure-friday	19	396	August 10, 2024
Create Graph in plotly with csv file data 📊 Plotly Python	0	409	January 29, 2021
Automatically generate Plotly charts using GPT-3 Dash Python show-and-tell	16	16493	August 20, 2020
Figure Friday 2025 - week 23 Dash Python figure-friday	29	276	June 14, 2025
Expense analysis developed with Dash Dash Python	0	460	November 30, 2018