Figure Friday 2025 - week 33

adamschroeder · August 15, 2025, 6:23pm

join the Figure Friday session on August 22, at noon Eastern Time, to showcase your creation and receive feedback from the community.

What are the class sizes in NYC schools?

Answer this question and a few others by using Plotly on the Class Size Report dataset.
(metadata info can be found here.)

Things to consider:

what can you improve in the app or sample figure below (sunburst)?
would you like to tell a different data story using a different graph or Dash app?
how can you explore the data with Plotly Studio?

Sample figure:

Code for sample figure:

from dash import Dash, dcc
import dash_ag_grid as dag
import plotly.express as px
import pandas as pd

# download CSV sheet from Google Drive - https://drive.google.com/file/d/18nGMU0VL-a9g1CLbfXnB78v2D0zsAkTF/view?usp=sharing
df = pd.read_csv('Class_Size_Report_Distribution_Summary__2010-2015__20250814.csv')
df['CLASS SIZE'] = pd.to_numeric(df['CLASS SIZE'], errors='coerce')


df_filtered = df.groupby(['BOROUGH','GRADE LEVEL'])[['CLASS SIZE']].mean().reset_index()
df_filtered['BOROUGH'] = df_filtered['BOROUGH'].replace({
    'X': 'Bronx',
    'R': 'Staten Island',
    'K': 'Brooklyn',
    'M': 'Manhattan',
    'Q': 'Queens'
})

fig = px.sunburst(df_filtered, path=['BOROUGH','GRADE LEVEL'], color='CLASS SIZE',
                  title="Average Class Size by Borough and Grade Level")
fig.layout.update(margin=dict(l=20, r=20, t=30, b=30))


grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[{"field": i, 'filter': True, 'sortable': True} for i in df.columns],
    dashGridOptions={"pagination": True},
    columnSize="sizeToFit"
)

app = Dash()
app.layout = [
    grid,
    dcc.Graph(figure=fig)
]


if __name__ == "__main__":
    app.run(debug=True)

For community members that would like to build the data app with Plotly Studio, but don’t have the application yet, simply click the Apply For Early Access button on Plotly.com/studio and fill out the form. You should get the invite and download emails shortly after. Please keep in mind that Plotly Studio is still in early access.

Below is a screenshot of the Plotly Studio app built on top of this dataset:

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to NYC Open Data portal for the data.

assad_zafar · August 17, 2025, 1:12pm

@adamschroeder total Students in data that you provided 38190266 i don’t know how your 37801345 come ??

Mike_Purtell · August 19, 2025, 3:30am

Hello @assad_zafar , Without filtering rows I get with the same NUMBER OF STUDENTS as you, 38,190,266. My guess is that @adamschroeder 's total number of students, about 0.5% lower, is due to filtering. Here are the filtering steps I used, which cut the data set almost by 1/2, which is not uncommon. Hope this helps:

import polars as pl

# print NUMBER OF STUDENTS from pipe, and return the DataFrame unchanged
def log_size(stage: str):
    def inner(df: pl.DataFrame) -> pl.DataFrame:
        print(f'{stage}: {df['NUMBER OF STUDENTS'].sum():,}')
        return df
    return inner


csv_file = 'Class_Size_Report_Distribution_Summary__2010-2015__20250814.csv'
df = (
    pl.read_csv(
    csv_file,
    ignore_errors=True
    )
    .pipe(log_size('No Filter, NUMBER OF STUDENTS'))
    .drop_nulls(subset='BOROUGH')
    .pipe(log_size('Remove Nulls from BOROUGH, NUMBER OF STUDENTS'))
    .drop_nulls(subset='CLASS SIZE')
    .pipe(log_size('Remove Nulls from CLASS SIZE, NUMBER OF STUDENTS'))
    .drop_nulls(subset='CORE SUBJECT')
    .pipe(log_size('Remove Nulls from CORE SUBJECT, NUMBER OF STUDENTS'))

)

OUTPUT
No Filter, NUMBER OF STUDENTS: 38,190,266
Remove Nulls from BOROUGH, NUMBER OF STUDENTS: 24,537,170
Remove Nulls from CLASS SIZE, NUMBER OF STUDENTS: 23,437,634
Remove Nulls from CORE SUBJECT, NUMBER OF STUDENTS: 22,176,594

Mike_Purtell · August 19, 2025, 4:54am

I published a Dashboard to Plotly Cloud for analysis of New York City class sized, here is a link to view it:
Dash

Here are the instructions I provided to Plotly Studio “make a dashboard for analysis of class size and number of students by BOROUGH, SCHOOL YEAR, GRAD LEVEL, and CORE SUBJECT”

This is how I cleaned the data and saved it as a Parquet file:

import polars as pl

# print NUMBER OF STUDENTS from pipe, and return the DataFrame unchanged
def log_size(stage: str):
    def inner(df: pl.DataFrame) -> pl.DataFrame:
        print(f'{stage}: {df['NUMBER OF STUDENTS'].sum():,}')
        return df
    return inner

csv_file = 'Class_Size_Report_Distribution_Summary__2010-2015__20250814.csv'
df = (
    pl.read_csv(
    csv_file,
    ignore_errors=True
    )
    .pipe(log_size('No Filter, NUMBER OF STUDENTS'))
    .drop_nulls(subset='BOROUGH')
    .pipe(log_size('Remove Nulls from BOROUGH, NUMBER OF STUDENTS'))
    .drop_nulls(subset='CLASS SIZE')
    .pipe(log_size('Remove Nulls from CLASS SIZE, NUMBER OF STUDENTS'))
    .drop_nulls(subset='CORE SUBJECT')
    .pipe(log_size('Remove Nulls from CORE SUBJECT, NUMBER OF STUDENTS'))
    .filter(pl.col('REPORT TYPE') == 'Borough')
    .pipe(log_size('filter REPORT TYPE == BOROUGH, NUMBER OF STUDENTS'))
    .select('SCHOOL YEAR','BOROUGH', 'GRADE LEVEL',
            'CORE SUBJECT','CLASS SIZE','NUMBER OF CLASSES' )
    .with_columns(
        BOROUGH = pl.col('BOROUGH')
            .str.replace('Q', 'Queens')
            .str.replace('M', 'Manhattan')
            .str.replace('R', 'Staten Island')  # Richmond county
            .str.replace('K', 'Brooklyn')       # Kings County
            .str.replace('X', 'The Bronx')
    )

)
df.write_parquet('NYC_Class_Sizes.parquet')

Here is a screenshot:

adamschroeder · August 19, 2025, 11:41am

Thank you @Mike_Purtell .

adamschroeder · August 19, 2025, 11:47am

Awesome app, @Mike_Purtell . Is there anything you would change in the app? Any chart you would do differently?

I think it might be a good idea to re-prompt the line chart so that each borough is a separate line. That way, if people choose Queens and Bronx, for example, they see two separate lines in two separate colors instead of the total average.

Avacsiglo21 · August 19, 2025, 1:36pm

Hi Mike, yes, that’s right. This is the view by Borough and School Year

Mike_Purtell · August 19, 2025, 11:48pm

Hi @Avacsiglo21, I like your heatmap showing the number of students by Borough and school year. Has me wondering why the numbers are so much lower in 2012-2013. I double checked that your numbers are accurate. There was a hurricane in October 2012 (Hurricane Sandy) that impacted the NYC area, but I doubt that is the issue. I couldn’t find any historic event that would have caused this.

Mike_Purtell · August 19, 2025, 11:52pm

Thank you @adamschroeder, great suggestion. I agree that there should be a separate trace for each Borough. I will try to coax the AI to put that in, and separately I want to try grabbing the code produced by Plotly Studio for further customization. There are other things I would change, but overall very happy and impressed by Plotly Studio dashboard.

Avacsiglo21 · August 20, 2025, 1:18am

Hi Mike, I was to ask you or Adams if you know what major event happened that year thanks you

adamschroeder · August 20, 2025, 12:17pm

That’s steep decline is really weird. I also did research about it but couldn’t find any substantial event. I wonder if it’s a problem with the data.

Avacsiglo21 · August 20, 2025, 1:21pm

Hi Everyone,

For this week 32 I built the NYC Educational Resource Distribution Analytics application as an interactive dashboard to help identify and visualize unusual patterns in how educational resources are distributed across New York City school districts. I use the Isolation Forest algorithm—to automatically detect districts with unusual patterns (outliers)

The objective is to help discover areas that might have an unequal distribution of resources or other unique circumstances. This lets investigate the data more deeply to understand what’s happening.

Key Features:

Anomaly Analysis by Year: Users can select a specific school year (2010-2015) to analyze metrics for that period, which allows for observing the evolution of patterns over time.
Adjustable Detection: The sensitivity of the anomaly detection algorithm can be modified using a slider control. A higher value identifies more “outliers,” while a lower value focuses on the most extreme and obvious anomalies.
Overall Visualization: A bar chart displays the distribution of anomalies across the five NYC boroughs, providing a high-level view of where atypical patterns are concentrated.
Critical Cases Table: A table is generated that lists the districts identified as the highest-severity “outliers,” highlighting those that require immediate attention.
Detailed Analysis (Drill-down): Users can select any district to explore in-depth its class size distribution by program type and by subject. This functionality provides a granular look that helps to understand the causes behind a district being classified as atypical.

The code

import dash
from dash import dcc, html, Input, Output, callback, dash_table
import dash_bootstrap_components as dbc
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

Cargar los datos

try:
df = pd.read_csv(‘cleaned_class_size_data.csv’)
except FileNotFoundError:
print(“Error: ‘cleaned_class_size_data.csv’ no encontrado. Por favor, asegúrate de que el archivo esté en el directorio correcto.”)
df = pd.DataFrame() # Crear un DataFrame vacío para evitar errores

Mapeo de códigos de borough a nombres reales

BOROUGH_MAPPING = {
# Códigos de letras comunes
‘M’: “Manhattan”,
‘X’: “Bronx”,
‘K’: “Brooklyn”,
‘Q’: “Queens”,
‘R’: “Staten Island”
}

def map_borough_names(df):
“”"
Convierte códigos de borough a nombres reales de NYC
“”"
df = df.copy()

# Aplicar mapeo si la columna contiene códigos
if 'BOROUGH' in df.columns:
    # Convertir a string para manejo consistente
    df['BOROUGH'] = df['BOROUGH'].astype(str).str.strip().str.upper()
    
    # Mapear códigos a nombres
    df['BOROUGH'] = df['BOROUGH'].map(BOROUGH_MAPPING).fillna(df['BOROUGH'])
    
    # Si hay códigos no mapeados, mantener el valor original pero capitalizado
    df['BOROUGH'] = df['BOROUGH'].str.title()

return df

Aplicar mapeo de nombres de borough

if not df.empty:
df = map_borough_names(df)

Preparar los datos para análisis de anomalías

def prepare_anomaly_data(dataframe):
# Crear métricas agregadas por distrito (CSD), año y borough
district_metrics = dataframe.groupby([‘SCHOOL_YEAR’, ‘BOROUGH’, ‘CSD’]).agg({
‘CLASS_SIZE’: [‘mean’, ‘std’, ‘min’, ‘max’],
‘NUMBER_OF_CLASSES’: [‘sum’, ‘mean’],
‘NUMBER_OF_STUDENTS’: [‘sum’, ‘mean’],
‘PERCENT_OF_STUDENTS_IN_BOROUGH_/GRADE/PROGRAM/_SUBJECT’: [‘mean’, ‘std’]
}).round(2)

# Aplanar columnas
district_metrics.columns = ['_'.join(col).strip() for col in district_metrics.columns]
district_metrics = district_metrics.reset_index()

# Calcular métricas adicionales
district_metrics['students_per_class'] = (district_metrics['NUMBER_OF_STUDENTS_sum'] / 
                                          district_metrics['NUMBER_OF_CLASSES_sum']).fillna(0)
district_metrics['class_size_variation'] = (district_metrics['CLASS_SIZE_std'] / 
                                              district_metrics['CLASS_SIZE_mean']).fillna(0)

return district_metrics

Preparar datos base (sin anomalías aún)

district_data = prepare_anomaly_data(df)

Inicializar la app con tema Bootstrap

app = dash.Dash(name, external_stylesheets=[dbc.themes.SOLAR, dbc.icons.FONT_AWESOME])

app.title=‘NYC Education Distribution Analyzer’

Layout moderno

app.layout = dbc.Container([
# Header
dbc.Row([
dbc.Col([
html.H1([
html.I(className=“fas fa-graduation-cap”, style={‘marginRight’: ‘15px’}),
“NYC Educational Resource Distribution Analytics”
], className=“text-center mb-2”, style={‘color’: ‘#2c3e50’, ‘fontWeight’: ‘300’}),
html.P([
"This dashboard identifies NYC districts with unusual patterns using data from 2010-2015 in student-to-class ratios, ",
"class sizes, and resource allocation using machine learning techniques. ",
"Discover outliers that may indicate inequitable distribution or exceptional circumstances ",
“requiring attention from education administrators.”
], className=“text-center text-muted mb-4”, style={‘fontSize’: ‘16px’, ‘maxWidth’: ‘800px’, ‘margin’: ‘0 auto’})
])
]),

# Controles en Cards modernas
dbc.Row([
    dbc.Col([
        dbc.Card([
            dbc.CardHeader([
                html.H5([
                    html.I(className="fas fa-calendar-alt", style={'marginRight': '10px'}),
                    "School Year Selection"
                ], className="mb-0")
            ]),
            dbc.CardBody([
                html.P("Choose the academic year to analyze resource distribution patterns:", 
                       className="text-muted mb-2", style={'fontSize': '14px'}),
                dbc.RadioItems(
                    id='year-radio',
                    options=[{'label': year, 'value': year} for year in sorted(df['SCHOOL_YEAR'].unique())] if not df.empty else [],
                    value=sorted(df['SCHOOL_YEAR'].unique())[-1] if not df.empty else None,
                    inline=True,
                    className="mb-0"
                )
            ])
        ], className="h-100")
    ], md=7),
    
    dbc.Col([
        dbc.Card([
            dbc.CardHeader([
                html.H5([
                    html.I(className="fas fa-sliders-h", style={'marginRight': '10px'}),
                    "Detection Sensitivity"
                ], className="mb-0")
            ]),
            dbc.CardBody([
                html.P("Adjust how strictly outliers are identified:", 
                       className="text-muted mb-2", style={'fontSize': '14px'}),
                dcc.Slider(
                    id='contamination-slider',
                    min=5,
                    max=25,
                    step=1,
                    value=10,
                    marks={i: f'{i}%' for i in range(5, 26, 5)},
                    tooltip={"placement": "bottom", "always_visible": True}
                )
            ])
        ], className="h-100")
    ], md=5)
], className="mb-4"),

# Métricas generales en Cards
dbc.Row(id='metrics-cards', className="mb-4"),

# Gráfico de análisis por borough y tabla de anomalías críticas en la misma fila
dbc.Row([
    dbc.Col([
        dbc.Card([
            dbc.CardHeader([
                html.H5([
                    html.I(className="fas fa-city", style={'marginRight': '10px'}),
                    "Resource Distribution by NYC Borough"
                ], className="mb-0")
            ]),
            dbc.CardBody([
                dcc.Graph(id='borough-analysis', style={'height': '450px'})
            ])
        ], className="h-100")
    ], md=6),
    
    dbc.Col([
        dbc.Card([
            dbc.CardHeader([
                html.H5([
                    html.I(className="fas fa-flag", style={'marginRight': '10px'}),
                    "Priority Alerts: Critical Outliers"
                ], className="mb-0", style={'color': '#e74c3c'})
            ]),
            dbc.CardBody(id='critical-anomalies-card', style={'height': '450px', 'overflowY': 'auto'})
        ], className="h-100")
    ], md=6)
], className="mb-4"),

# Investigación drill-down
dbc.Row([
    dbc.Col([
        dbc.Card([
            dbc.CardHeader([
                html.H5([
                    html.I(className="fas fa-microscope", style={'marginRight': '10px'}),
                    "Deep Dive Analysis"
                ], className="mb-0")
            ]),
            dbc.CardBody([
                html.P([
                    "Select a specific district to explore detailed breakdowns of class distribution, ",
                    "program types, and subject allocations. This granular view helps identify the root ",
                    "causes behind unusual resource patterns."
                ], className="text-muted mb-3"),
                dbc.Row([
                    dbc.Col([
                        dbc.Label("District Selection:", className="fw-bold"),
                        dcc.Dropdown(
                            id='district-dropdown',
                            placeholder="Choose a district for detailed investigation...",
                            className="mb-3"
                        )
                    ])
                ]),
                html.Div(id='drilldown-analysis')
            ])
        ])
    ])
], className="mb-4"),

# Footer DEFINITIVAMENTE al final
html.Hr(className="mt-5"),
dbc.Row([
    dbc.Col([
        html.Footer([
            dbc.Row([
                dbc.Col([
                    html.P([
                        html.I(className="fas fa-chart-line", style={'marginRight': '8px', 'color': '#3498db'}),
                        "NYC Educational Resource Analytics Dashboard"
                    ], className="mb-1 text-center", style={'fontWeight': 'bold', 'color': '#2c3e50'}),
                    html.P([
                        "Powered by Machine Learning • Isolation Forest Algorithm • ",
                        "Built with Plotly Dash & Bootstrap"
                    ], className="mb-0 text-muted text-center", style={'fontSize': '12px'}),
                    html.P([
                        html.I(className="fas fa-calendar", style={'marginRight': '5px', 'color': '#95a5a6'}),
                        "Data Analysis Dashboard © 2025"
                    ], className="mb-0 text-muted text-center mt-2", style={'fontSize': '12px'})
                ], md=12)
            ])
        ], style={
            'backgroundColor': '#f1f3f4',
            'padding': '20px',
            'marginTop': '30px',
            'borderTop': '2px solid #e9ecef',
            'borderRadius': '8px'
        })
    ])
])

], fluid=True, style={‘backgroundColor’: ‘#f8f9fa’, ‘minHeight’: ‘100vh’, ‘padding’: ‘20px’})

Función para detectar anomalías (CORREGIDA)

def detect_anomalies(data, contamination=0.1):
# Seleccionar features para análisis
feature_cols = [
‘CLASS_SIZE_mean’, ‘CLASS_SIZE_std’, ‘NUMBER_OF_CLASSES_sum’,
‘NUMBER_OF_STUDENTS_sum’, ‘students_per_class’, ‘class_size_variation’,
‘PERCENT_OF_STUDENTS_IN_BOROUGH_/GRADE/PROGRAM/_SUBJECT_mean’
]

# Preparar datos para ML
X = data[feature_cols].fillna(0)

# Escalar datos
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Aplicar Isolation Forest
iso_forest = IsolationForest(contamination=contamination, random_state=42, n_estimators=100)
anomaly_labels = iso_forest.fit_predict(X_scaled)
anomaly_scores = iso_forest.score_samples(X_scaled)

# Agregar resultados al dataframe
data_with_anomalies = data.copy()
data_with_anomalies['is_anomaly'] = anomaly_labels == -1
data_with_anomalies['anomaly_score'] = anomaly_scores

# CORRIGIDO: Calcular severidad basada en los scores ACTUALES, no fijos
# Usar percentiles de los scores actuales para determinar severidad
score_25th = np.percentile(anomaly_scores, 25)
score_10th = np.percentile(anomaly_scores, 10)

conditions = [
    anomaly_scores <= score_10th,  # 10% más bajos = High severity
    anomaly_scores <= score_25th,  # Entre 10-25% = Medium severity
    anomaly_scores > score_25th    # Resto = Normal
]
choices = ['High', 'Medium', 'Normal']

data_with_anomalies['anomaly_severity'] = np.select(conditions, choices, default='Normal')

return data_with_anomalies, feature_cols

@callback(
[Output(‘borough-analysis’, ‘figure’),
Output(‘metrics-cards’, ‘children’),
Output(‘critical-anomalies-card’, ‘children’),
Output(‘district-dropdown’, ‘options’)],
[Input(‘year-radio’, ‘value’),
Input(‘contamination-slider’, ‘value’)]
)
def update_anomaly_analysis(selected_year, contamination):
# Validar entrada de contaminación
if contamination is None or contamination < 5 or contamination > 25:
contamination = 10

contamination = contamination / 100  # Convertir porcentaje a decimal

# Filtrar datos por año
year_data = district_data[district_data['SCHOOL_YEAR'] == selected_year].copy()

if year_data.empty:
    empty_fig = go.Figure()
    empty_fig.add_annotation(text="No hay datos para el año seleccionado", 
                             xref="paper", yref="paper", x=0.5, y=0.5, showarrow=False)
    empty_metrics = html.Div("Sin datos disponibles", className="text-center")
    return empty_fig, empty_metrics, "Sin datos", []

# Detectar anomalías con nueva configuración
year_anomalies, _ = detect_anomalies(year_data, contamination)

# 1. Análisis por borough CORREGIDO
borough_stats = year_anomalies.groupby('BOROUGH').agg({
    'is_anomaly': 'sum',  # Cuenta de outliers por borough
    'anomaly_score': 'mean',
    'CSD': 'count',  # Total de distritos por borough
    'CLASS_SIZE_mean': 'mean',
    'NUMBER_OF_STUDENTS_sum': 'sum'
}).reset_index()

# CORREGIDO: Calcular rate basado en outliers reales detectados
borough_stats['anomaly_rate'] = (borough_stats['is_anomaly'] / borough_stats['CSD'] * 100).round(1)

# Definir orden y colores específicos para los boroughs de NYC
borough_order = ['Manhattan', 'Brooklyn', 'Queens', 'Bronx', 'Staten Island']
borough_colors = {
    'Manhattan': '#FF6B6B',    # Rojo coral
    'Brooklyn': '#4ECDC4',     # Verde azulado  
    'Queens': '#45B7D1',       # Azul claro
    'Bronx': '#96CEB4',        # Verde menta
    'Staten Island': '#FFEAA7' # Amarillo claro
}

# Ordenar datos según el orden deseado
borough_stats['sort_key'] = borough_stats['BOROUGH'].map({b: i for i, b in enumerate(borough_order)})
borough_stats = borough_stats.sort_values('sort_key').drop('sort_key', axis=1)

# Crear gráfico de barras MEJORADO con colores específicos
borough_fig = go.Figure()

# Asignar colores basados en el borough
colors = [borough_colors.get(borough, '#95A5A6') for borough in borough_stats['BOROUGH']]

borough_fig.add_trace(go.Bar(
    x=borough_stats['BOROUGH'],
    y=borough_stats['anomaly_rate'],
    marker=dict(
        color=colors,
        line=dict(color='white', width=1),
        opacity=0.8
    ),
    text=[f'{rate}%<br>{int(outliers)} outliers<br>{int(total)} districts' 
          for rate, outliers, total in 
          zip(borough_stats['anomaly_rate'], borough_stats['is_anomaly'], borough_stats['CSD'])],
    textposition='outside',
    textfont=dict(size=11, color='#2c3e50'),
    hovertemplate='<b>%{x}</b><br>' +
                  'Outlier Rate: %{y}%<br>' +
                  'Avg Anomaly Score: %{customdata[2]:.3f}<br>' +
                  'Outliers: %{customdata[0]}<br>' +
                  'Total Districts: %{customdata[1]}<br>' +
                  'Total Students: %{customdata[3]:,}<br>' +
                  '<extra></extra>',
    customdata=list(zip(borough_stats['is_anomaly'], borough_stats['CSD'], 
                        borough_stats['anomaly_score'], borough_stats['NUMBER_OF_STUDENTS_sum']))
))

borough_fig.update_layout(
    title=dict(
        text=f"Outlier Distribution Across NYC Boroughs ({selected_year})<br><sub>Sensitivity: {int(contamination*100)}%</sub>",
        x=0.5,
        font=dict(size=16, color='#2c3e50')
    ),
    # xaxis_title="NYC Borough",
    yaxis_title="Outlier Rate (%)",
    height=450,
    template='simple_white',
    showlegend=False,
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    xaxis=dict(
        tickangle=45,
        tickfont=dict(size=12)
    ),
    yaxis=dict(
        tickfont=dict(size=12),
        gridcolor='rgba(0,0,0,0.1)'
    )
)

# 2. Métricas en cards CORREGIDAS
total_districts = len(year_anomalies)
anomaly_count = year_anomalies['is_anomaly'].sum()
anomaly_rate = (anomaly_count / total_districts * 100).round(1) if total_districts > 0 else 0
avg_anomaly_score = year_anomalies['anomaly_score'].mean().round(3) if len(year_anomalies) > 0 else 0
# CORREGIDO: Critical cases ahora responde al slider
critical_count = len(year_anomalies[year_anomalies['anomaly_severity'] == 'High'])

metrics_cards = dbc.Row([
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H3(f"{total_districts}", className="text-primary mb-0"),
                html.P("Districts Analyzed", className="text-muted mb-0", style={'fontSize': '14px'}),
                html.Small(f"Year: {selected_year}", className="text-muted")
            ])
        ], className="text-center border-primary")
    ], md=3),
    
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H3(f"{anomaly_count}", className="text-danger mb-0"),
                html.P("Outliers Detected", className="text-muted mb-0", style={'fontSize': '14px'}),
                html.Small(f"Sensitivity: {int(contamination*100)}%", className="text-muted")
            ])
        ], className="text-center border-danger")
    ], md=3),
    
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H3(f"{anomaly_rate}%", className="text-warning mb-0"),
                html.P("Outlier Rate", className="text-muted mb-0", style={'fontSize': '14px'}),
                html.Small(f"Score avg: {avg_anomaly_score}", className="text-muted")
            ])
        ], className="text-center border-warning")
    ], md=3),
    
    dbc.Col([
        dbc.Card([
            dbc.CardBody([
                html.H3(f"{critical_count}", className="text-dark mb-0"),
                html.P("Critical Cases", className="text-muted mb-0", style={'fontSize': '14px'}),
                html.Small("High severity outliers", className="text-muted")
            ])
        ], className="text-center border-dark")
    ], md=3)
])

# 3. Tabla de anomalías críticas CORREGIDA
critical_anomalies = year_anomalies[year_anomalies['anomaly_severity'] == 'High'].sort_values('anomaly_score')

if len(critical_anomalies) > 0:
    critical_table = html.Div([
        html.P(f"Found {len(critical_anomalies)} districts with high-severity patterns:", 
               className="text-muted mb-2", style={'fontSize': '14px'}),
        dash_table.DataTable(
            data=critical_anomalies[['BOROUGH', 'CSD', 'CLASS_SIZE_mean', 'NUMBER_OF_STUDENTS_sum', 
                                     'NUMBER_OF_CLASSES_sum', 'students_per_class', 'anomaly_score']].to_dict('records'),
            columns=[
                {'name': 'Borough', 'id': 'BOROUGH'},
                {'name': 'District', 'id': 'CSD'},
                {'name': 'Avg Class Size', 'id': 'CLASS_SIZE_mean', 'type': 'numeric', 'format': {'specifier': '.1f'}},
                {'name': 'Total Students', 'id': 'NUMBER_OF_STUDENTS_sum', 'type': 'numeric'},
                {'name': 'Total Classes', 'id': 'NUMBER_OF_CLASSES_sum', 'type': 'numeric'},
                {'name': 'Students/Class', 'id': 'students_per_class', 'type': 'numeric', 'format': {'specifier': '.1f'}},
                {'name': 'Outlier Score', 'id': 'anomaly_score', 'type': 'numeric', 'format': {'specifier': '.3f'}}
            ],
            style_cell={
                'textAlign': 'center', 
                'fontSize': '12px',
                'fontFamily': 'Arial, sans-serif'
            },
            style_data_conditional=[
                {
                    'if': {'row_index': 'odd'},
                    'backgroundColor': '#f8f9fa'
                }
            ],
            style_header={
                'backgroundColor': '#dc3545', 
                'color': 'white', 
                'fontWeight': 'bold',
                'border': '1px solid #dee2e6'
            },
            style_table={'overflowX': 'auto', 'height': '350px', 'overflowY': 'auto'},
            page_size=8
        )
    ])
else:
    critical_table = dbc.Alert([
        html.I(className="fas fa-check-circle", style={'marginRight': '10px'}),
        f"No critical outliers found with current sensitivity ({int(contamination*100)}%). ",
        "Try increasing sensitivity to detect more subtle patterns."
    ], color="success", className="text-center")

# 4. Opciones para dropdown de drill-down con emojis para cada borough
borough_emojis = {
    'Manhattan': '🏙️',
    'Brooklyn': '🌉', 
    'Queens': '✈️',
    'Bronx': '🏟️',
    'Staten Island': '🚢'
}

district_options = [
    {'label': f"{borough_emojis.get(row['BOROUGH'], '🏫')} {row['BOROUGH']} - District {row['CSD']} {'🚨' if row['is_anomaly'] else '✅'}", 
     'value': f"{row['BOROUGH']}_{row['CSD']}"}
    for _, row in year_anomalies.iterrows()
]

return borough_fig, metrics_cards, critical_table, district_options

@callback(
Output(‘drilldown-analysis’, ‘children’),
[Input(‘district-dropdown’, ‘value’),
Input(‘year-radio’, ‘value’),
Input(‘contamination-slider’, ‘value’)]
)
def update_drilldown(selected_district, selected_year, contamination):
if not selected_district:
return dbc.Alert([
html.I(className=“fas fa-info-circle”, style={‘marginRight’: ‘10px’}),
“Please select a district above to begin detailed investigation.”
], color=“info”, className=“text-center”)

# Validar contaminación
if contamination is None:
    contamination = 0.1
else:
    contamination = contamination / 100

# Parsear selección
borough, csd = selected_district.split('_')
csd = float(csd)

# Filtrar datos específicos del distrito
district_detail = df[
    (df['SCHOOL_YEAR'] == selected_year) & 
    (df['BOROUGH'] == borough) & 
    (df['CSD'] == csd)
].copy()

if district_detail.empty:
    return dbc.Alert("No data available for the selected district.", color="warning", className="text-center")

# Obtener información de anomalía
year_data = district_data[district_data['SCHOOL_YEAR'] == selected_year].copy()
year_anomalies, _ = detect_anomalies(year_data, contamination)

district_anomaly = year_anomalies[
    (year_anomalies['BOROUGH'] == borough) & 
    (year_anomalies['CSD'] == csd)
]

if district_anomaly.empty:
    return dbc.Alert("No outlier information found for this district.", color="warning")

district_anomaly = district_anomaly.iloc[0]

# Análisis detallado por programa y grado
program_analysis = district_detail.groupby(['PROGRAM_TYPE', 'GRADE_LEVEL']).agg({
    'CLASS_SIZE': 'mean',
    'NUMBER_OF_STUDENTS': 'sum',
    'NUMBER_OF_CLASSES': 'sum'
}).reset_index()

# Gráfico de distribución por programa
program_fig = px.bar(
    program_analysis,
    x='PROGRAM_TYPE',
    y='CLASS_SIZE',
    color='GRADE_LEVEL',
    labels={'PROGRAM_TYPE':'Program Type', 'CLASS_SIZE':'Clase Size', 'GRADE_LEVEL':'Grade Level'},
    title="Class Size Distribution by Program & Grade",
    text='CLASS_SIZE', 
    color_discrete_sequence=px.colors.qualitative.Set3
)
program_fig.update_traces(texttemplate='%{text:.1f}', textposition='outside')
program_fig.update_layout(template='simple_white', height=400)

# Análisis por materia
subject_analysis = district_detail.groupby('CORE_SUBJECT').agg({
    'CLASS_SIZE': 'mean',
    'NUMBER_OF_STUDENTS': 'sum'
}).reset_index()

subject_fig = px.pie(
    subject_analysis,
    values='NUMBER_OF_STUDENTS',
    names='CORE_SUBJECT',
    hole=0.65,
    title="Student Distribution by Core Subject",
    color_discrete_sequence=px.colors.qualitative.Pastel
)
subject_fig.update_layout(template='simple_white', height=400)

# Determinar color de severity
severity_colors = {'High': 'danger', 'Medium': 'warning', 'Normal': 'success'}
severity_color = severity_colors.get(district_anomaly['anomaly_severity'], 'secondary')

# Emoji para el borough
borough_emojis = {
    'Manhattan': '🏙️',
    'Brooklyn': '🌉', 
    'Queens': '✈️',
    'Bronx': '🏟️',
    'Staten Island': '🚢'
}

# Card de información del distrito MEJORADA
district_info_card = dbc.Card([
    dbc.CardHeader([
        html.H5([
            html.I(className="fas fa-map-marker-alt", style={'marginRight': '10px'}),
            f"{borough_emojis.get(borough, '🏫')} {borough} - District {csd}",
            html.Span("🚨" if district_anomaly['is_anomaly'] else "✅", 
                      style={'marginLeft': '10px', 'fontSize': '20px'})
        ], className="mb-0")
    ]),
    dbc.CardBody([
        dbc.Row([
            dbc.Col([
                dbc.Badge(f"Pattern: {district_anomaly['anomaly_severity']}", 
                          color=severity_color, className="mb-2"),
                html.P(f"🎯 Outlier Score: {district_anomaly['anomaly_score']:.3f}", 
                       className="mb-1"),
                html.P(f"📚 Average Class Size: {district_anomaly['CLASS_SIZE_mean']:.1f}", 
                       className="mb-1"),
                html.P(f"📊 Class Size Variation: {district_anomaly['class_size_variation']:.2f}", 
                       className="mb-0")
            ], md=6),
            dbc.Col([
                html.P(f"🧑‍🎓 Total Students: {int(district_anomaly['NUMBER_OF_STUDENTS_sum']):,}",
                       className="mb-1"),
                html.P(f"🏫 Total Classes: {int(district_anomaly['NUMBER_OF_CLASSES_sum']):,}",
                       className="mb-1"),
                html.P(f"👥 Students per Class: {district_anomaly['students_per_class']:.1f}", 
                       className="mb-1"),
                html.P(f"📋 Avg % of Students: {district_anomaly['PERCENT_OF_STUDENTS_IN_BOROUGH_/_GRADE_/_PROGRAM_/_SUBJECT_mean']:.1f}%", 
                       className="mb-0")
            ], md=6)
        ])
    ])
], className="mb-4")

return html.Div([
    district_info_card,
    dbc.Row([
        dbc.Col(dcc.Graph(figure=program_fig), md=6),
        dbc.Col(dcc.Graph(figure=subject_fig), md=6)
    ])
])

server = app.server

https://nyc-educational-distribution.plotly.app
Your feedback, including any comments, suggestions, or questions, is always appreciated as this application is an ongoing process.

Avacsiglo21 · August 20, 2025, 1:35pm

Looking closer at the data, it seems to be exactly half compared to the other years. Could it be that only six months are reported here?

adamschroeder · August 20, 2025, 7:00pm

@Avacsiglo21
In this outlier card it says Pattern High. Does that mean all the numbers are high compared to the average rates?

Avacsiglo21 · August 20, 2025, 9:26pm

That’s a great question. The short answer Yes, it means that the overall trend of the district’s numbers is high compared to the average rates of other districts. It’s not just one number, but a consistent pattern across the data that is elevated.In the context of an outlier card, “Pattern High” doesn’t necessarily mean all numbers are high compared to the average. It’s a bit more nuanced than that. “Pattern High” likely indicates that the overall distribution or pattern of the data within that specific district is unusually high or elevated when compared to the typical patterns observed in other districts.

Avacsiglo21 · August 21, 2025, 1:16pm

Plotly Studio version

web link:
Dash

adamschroeder · August 21, 2025, 2:22pm

it’s interesting to see how the English vs Math student count changes once they go into high school.

Ester · August 22, 2025, 8:17am

The dashboard visually presents NYC school class size data, allowing users to filter by year, program type, borough, and grade level.Each filter is accompanied by an info button that provides helpful tooltips for user guidance.Key metrics like Total Students, Average Class Size, and Total Classes are clearly displayed at the top.Multiple interactive charts—including a stacked bar, heatmap, boxplot, and scatter plot—make data exploration intuitive.

Code

import pandas as pd
import plotly.express as px
import dash
from dash import dcc, html, Input, Output
import dash_bootstrap_components as dbc
import dash_ag_grid as dag

# Color palette for charts
chart_colors = [
    "#33658a",  # blue
    "#86bbd8",  # light blue
    "#758e4f",  # olive green
    "#f6ae2d",  # yellow-orange
    "#f26419",  # orange-red
]

colors = {
    "background": "#f8f9fac5",
    "text": "#2d3436",
    "black": "#111111",
    "accent": "#33658a",
    "sidebar": "#e9ecefe2"  # light grey
}

def load_data():
    try:
        df = pd.read_csv('class.csv')
        df.columns = df.columns.str.strip()
        df['CLASS SIZE'] = pd.to_numeric(df['CLASS SIZE'], errors='coerce')
        df['NUMBER OF STUDENTS'] = pd.to_numeric(df['NUMBER OF STUDENTS'], errors='coerce')
        df['NUMBER OF CLASSES'] = pd.to_numeric(df['NUMBER OF CLASSES'], errors='coerce')
        df = df.dropna(subset=['CLASS SIZE', 'NUMBER OF STUDENTS', 'NUMBER OF CLASSES'])
        df = df[df['CLASS SIZE'] > 0]
        return df
    except Exception as e:
        print(f"Error loading data: {e}")
        return pd.DataFrame()

df = load_data()

def get_options(col):
    if col in df.columns:
        return ['All'] + sorted(df[col].dropna().astype(str).str.strip().unique().tolist())
    return ['All']

boroughs = get_options('BOROUGH')
grades = get_options('GRADE LEVEL')
programs = get_options('PROGRAM TYPE')
years = get_options('SCHOOL YEAR')

def info_button(id):
    return dbc.Button(
        "i",
        id=id,
        color="info",
        size="sm",
        outline=True,
        style={
            "marginLeft": "8px",
            "borderRadius": "50%",
            "fontWeight": "bold",
            "width": "28px",
            "height": "28px",
            "padding": "0",
            "fontSize": "1em"
        }
    )

sidebar = dbc.Card(
    [
        html.Br(),
        html.Br(),
        dbc.Label(
            [
                "School Year",
                info_button("info-year")
            ],
            style={
                'fontSize': '1.2em',
                'color': '#222',
                'fontWeight': 'bold',
                'letterSpacing': '1px',
                'display': 'flex',
                'alignItems': 'center'
            }
        ),
        dcc.Dropdown(
            id='year-dropdown',
            options=[{'label': y, 'value': y} for y in years],
            value='All',
            style={'marginBottom': '32px', 'fontSize': '1.1em'}
        ),
        dbc.Label(
            [
                "Program Type",
                info_button("info-program")
            ],
            style={
                'fontSize': '1.2em',
                'color': '#222',
                'fontWeight': 'bold',
                'letterSpacing': '1px',
                'display': 'flex',
                'alignItems': 'center'
            }
        ),
        dcc.Dropdown(
            id='program-dropdown',
            options=[{'label': p, 'value': p} for p in programs],
            value='All',
            style={'marginBottom': '32px', 'fontSize': '1.1em'}
        ),
        dbc.Label(
            [
                "Borough",
                info_button("info-borough")
            ],
            style={
                'fontSize': '1.2em',
                'color': '#222',
                'fontWeight': 'bold',
                'letterSpacing': '1px',
                'display': 'flex',
                'alignItems': 'center'
            }
        ),
        dcc.Dropdown(
            id='borough-dropdown',
            options=[{'label': b, 'value': b} for b in boroughs],
            value='All',
            style={'marginBottom': '32px', 'fontSize': '1.1em'}
        ),
        dbc.Label(
            [
                "Grade Level",
                info_button("info-grade")
            ],
            style={
                'fontSize': '1.2em',
                'color': '#222',
                'fontWeight': 'bold',
                'letterSpacing': '1px',
                'display': 'flex',
                'alignItems': 'center'
            }
        ),
        dcc.Dropdown(
            id='grade-dropdown',
            options=[{'label': g, 'value': g} for g in grades],
            value='All',
            style={'marginBottom': '32px', 'fontSize': '1.1em'}
        ),
        html.Div(
            dbc.Button(
                "Clear All",
                id="clear-filters-btn",
                color="secondary",
                style={
                    "width": "100%",
                    "fontWeight": "bold",
                    "fontSize": "1.1em",
                    "marginTop": "40px",
                    "marginBottom": "10px",
                    "backgroundColor": "#33658a",
                    "border": "none"
                }
            ),
            style={"textAlign": "center"}
        ),
    ],
    body=True,
    style={
        "height": "100vh",
        "backgroundColor": colors["sidebar"],
        "boxShadow": "0 2px 8px rgba(0,0,0,0.07)",
        "borderRadius": "0",
        "padding": "10px"
    }
)

content = dbc.Container([
    html.Br(),
    html.H1(
        'NYC School Class Size Dashboard',
        style={
            'textAlign': 'center',
            'color': chart_colors[0],
            'fontSize': '2.8em',
            'fontWeight': 'bold',
            'letterSpacing': '2px',
            'fontFamily': 'Montserrat, Arial, sans-serif',
            'marginBottom': '10px'
        }
    ),
    html.Br(),
    dbc.Row(id='kpi-cards', className="mb-4", justify="center"),
    dbc.Row([
        dbc.Col(dcc.Graph(id='stacked-bar'), md=6),
        dbc.Col(dcc.Graph(id='heatmap'), md=6),
    ], className="mb-4"),
    dbc.Row([
        dbc.Col(dcc.Graph(id='program-boxplot'), md=6),
        dbc.Col(dcc.Graph(id='students-classes-scatter'), md=6),
    ]),
    dbc.Row([
        dbc.Col([
            html.Div(id="ag-table-container")
        ], width=12)
    ])
], fluid=True, style={'backgroundColor': colors['background'], 'paddingBottom': '30px'})

tooltips = [
    dbc.Tooltip("Select the school year to filter data.", target="info-year", placement="right"),
    dbc.Tooltip("Select the program type to filter data.", target="info-program", placement="right"),
    dbc.Tooltip("Select the borough to filter data.", target="info-borough", placement="right"),
    dbc.Tooltip("Select the grade level to filter data.", target="info-grade", placement="right"),
]

app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server

app.layout = dbc.Container([
    dbc.Row([
        dbc.Col(sidebar, width=2),
        dbc.Col(content, width=10)
    ], style={'height': '100vh'}),
    *tooltips
], fluid=True, style={'padding': '0', 'backgroundColor': colors['background']})


@app.callback(
    [Output('kpi-cards', 'children'),
     Output('stacked-bar', 'figure'),
     Output('heatmap', 'figure'),
     Output('program-boxplot', 'figure'),
     Output('students-classes-scatter', 'figure')],
    [Input('borough-dropdown', 'value'),
     Input('grade-dropdown', 'value'),
     Input('year-dropdown', 'value'),
     Input('program-dropdown', 'value')]
)
def update_dashboard(borough, grade, selected_year, program):
    filtered_df = df.copy()
    if selected_year != 'All':
        filtered_df = filtered_df[filtered_df['SCHOOL YEAR'].astype(str).str.strip() == str(selected_year).strip()]
    if borough != 'All':
        filtered_df = filtered_df[filtered_df['BOROUGH'].astype(str).str.strip() == str(borough).strip()]
    if grade != 'All':
        filtered_df = filtered_df[filtered_df['GRADE LEVEL'].astype(str).str.strip() == str(grade).strip()]
    if program != 'All':
        filtered_df = filtered_df[filtered_df['PROGRAM TYPE'].astype(str).str.strip() == str(program).strip()]

    # KPIs
    total_students = int(filtered_df['NUMBER OF STUDENTS'].sum()) if not filtered_df.empty else 0
    avg_class_size = filtered_df['CLASS SIZE'].mean() if not filtered_df.empty else 0
    total_classes = int(filtered_df['NUMBER OF CLASSES'].sum()) if not filtered_df.empty else 0
    num_records = len(filtered_df)
    avg_students_per_class = (total_students / total_classes) if total_classes > 0 else 0

    def kpi_card(kpi_name, value):
        return dbc.Card(
            [
                dbc.CardHeader(
                    kpi_name,
                    style={
                        'fontWeight': 'bold',
                        'fontSize': '1.2em',
                        'color': chart_colors[0],
                        'backgroundColor': '#f8f9fa',
                        'textAlign': 'center',
                        'letterSpacing': '1px',
                        'borderBottom': '1px solid #eee'
                    }
                ),
                dbc.CardBody([html.Div(value, style={
                    'fontWeight': 'bold',
                    'fontSize': '2.1em',
                    'color': colors['black'],
                    'textAlign': 'center'
                })])
            ],
            color="light",
            style={'backgroundColor': 'white', 'border': 'none', 'marginBottom': '10px'}
        )

    kpi_cards = dbc.Row([
        dbc.Col(kpi_card("Total Students", f"{total_students:,}"), md=2),
        dbc.Col(kpi_card("Avg Class Size", f"{avg_class_size:.1f}"), md=2),
        dbc.Col(kpi_card("Total Classes", f"{total_classes:,}"), md=2),
        dbc.Col(kpi_card("Records", f"{num_records:,}"), md=2),
        dbc.Col(kpi_card("Avg Students/Class", f"{avg_students_per_class:.1f}"), md=2),
    ], justify="center")

    # 1. Stacked bar chart: number of students by program type and year
    if not filtered_df.empty and 'SCHOOL YEAR' in filtered_df.columns and 'PROGRAM TYPE' in filtered_df.columns:
        bar_df = filtered_df.groupby(['SCHOOL YEAR', 'PROGRAM TYPE']).agg({'NUMBER OF STUDENTS': 'sum'}).reset_index()
        fig1 = px.bar(
            bar_df,
            x='SCHOOL YEAR',
            y='NUMBER OF STUDENTS',
            color='PROGRAM TYPE',
            barmode='stack',
            color_discrete_sequence=chart_colors,
            title='Total Students by Program Type and Year',
            template='plotly_white'
        )
    else:
        fig1 = px.bar(title='No data available', template='plotly_white')
        fig1.add_annotation(
            text="No data for selected filters",
            x=0.5, y=0.5, showarrow=False
        )

    # 2. Heatmap: average class size by Borough × School Year
    if not filtered_df.empty and 'BOROUGH' in filtered_df.columns and 'SCHOOL YEAR' in filtered_df.columns:
        heatmap_df = filtered_df.groupby(['BOROUGH', 'SCHOOL YEAR']).agg({'CLASS SIZE': 'mean'}).reset_index()
        if not heatmap_df.empty:
            heatmap_pivot = heatmap_df.pivot(index='BOROUGH', columns='SCHOOL YEAR', values='CLASS SIZE')
            if not heatmap_pivot.empty:
                fig2 = px.imshow(
                    heatmap_pivot,
                    labels=dict(x="School Year", y="Borough", color="Avg Class Size"),
                    color_continuous_scale=["#86bbd8", "#f6ae2d", "#f26419"],
                    title='Average Class Size Heatmap (Borough vs School Year)',
                    template='plotly_white'
                )
            else:
                fig2 = px.imshow([[0]], title='No data available', template='plotly_white')
                fig2.add_annotation(
                    text="No data for selected filters",
                    x=0.5, y=0.5, showarrow=False
                )
        else:
            fig2 = px.imshow([[0]], title='No data available', template='plotly_white')
            fig2.add_annotation(
                text="No data for selected filters",
                x=0.5, y=0.5, showarrow=False
            )
    else:
        fig2 = px.imshow([[0]], title='No data available', template='plotly_white')
        fig2.add_annotation(
            text="No data for selected filters",
            x=0.5, y=0.5, showarrow=False
        )

    # 3. Program type boxplot: class size distribution by program type
    if 'PROGRAM TYPE' in filtered_df.columns and not filtered_df.empty:
        fig3 = px.box(
            filtered_df,
            x='PROGRAM TYPE',
            y='CLASS SIZE',
            title='Class Size by Program Type',
            color='PROGRAM TYPE',
            color_discrete_sequence=chart_colors,
            template='plotly_white'
        )
    else:
        fig3 = px.box(title='No data available', template='plotly_white')
        fig3.add_annotation(
            text="No data for selected filters",
            x=0.5, y=0.5, showarrow=False
        )

    # 4. Students vs Classes Scatter
    if not filtered_df.empty:
        fig4 = px.scatter(
            filtered_df,
            x='NUMBER OF CLASSES',
            y='NUMBER OF STUDENTS',
            color='BOROUGH' if 'BOROUGH' in filtered_df.columns else None,
            title='Students vs Classes',
            size='CLASS SIZE',
            color_discrete_sequence=chart_colors,
            template='plotly_white'
        )
    else:
        fig4 = px.scatter(title='No data available', template='plotly_white')
        fig4.add_annotation(
            text="No data for selected filters",
            x=0.5, y=0.5, showarrow=False
        )

    return kpi_cards, fig1, fig2, fig3, fig4

# Clear All button callback
@app.callback(
    [
        Output('year-dropdown', 'value'),
        Output('program-dropdown', 'value'),
        Output('borough-dropdown', 'value'),
        Output('grade-dropdown', 'value'),
    ],
    [Input('clear-filters-btn', 'n_clicks')],
    prevent_initial_call=True
)
def clear_all_filters(n_clicks):
    return 'All', 'All', 'All', 'All'

# AG Grid table always visible, updates with filters
@app.callback(
    Output("ag-table-container", "children"),
    [
        Input('borough-dropdown', 'value'),
        Input('grade-dropdown', 'value'),
        Input('year-dropdown', 'value'),
        Input('program-dropdown', 'value')
    ]
)
def update_table(borough, grade, selected_year, program):
    filtered_df = df.copy()
    if selected_year != 'All':
        filtered_df = filtered_df[filtered_df['SCHOOL YEAR'].astype(str).str.strip() == str(selected_year).strip()]
    if borough != 'All':
        filtered_df = filtered_df[filtered_df['BOROUGH'].astype(str).str.strip() == str(borough).strip()]
    if grade != 'All':
        filtered_df = filtered_df[filtered_df['GRADE LEVEL'].astype(str).str.strip() == str(grade).strip()]
    if program != 'All':
        filtered_df = filtered_df[filtered_df['PROGRAM TYPE'].astype(str).str.strip() == str(program).strip()]

    if filtered_df.empty:
        return html.Div("No data for selected filters.", style={"textAlign": "center", "color": "#f26419", "fontWeight": "bold", "marginTop": "20px"})

    columns = []
    for col in filtered_df.columns:
        col_def = {
            "field": col,
            "headerName": col,
            "sortable": True,
            "filter": True,
            "resizable": True,
            "flex": 1,
        }
        if col == "SCHOOL NAME":
            col_def["minWidth"] = 250
        columns.append(col_def)

    return dag.AgGrid(
        id="ag-grid-table",
        rowData=filtered_df.to_dict("records"),
        columnDefs=columns,
        defaultColDef={
            "flex": 1,
            "minWidth": 120,
            "resizable": True,
        },
        dashGridOptions={
            "pagination": True,
            "paginationPageSize": 15,
            "domLayout": "autoHeight",
        },
        style={
            "height": "500px",
            "marginTop": "10px",
            "backgroundColor": "#f8f9fa",
            "borderRadius": "8px",
            "boxShadow": "0 2px 8px rgba(0,0,0,0.07)"
        }
    )

if __name__ == '__main__':
    app.run(debug=True, port=8050)

Ester · August 22, 2025, 12:55pm

@Avacsiglo21, your work inspired me to make a heatmap too, thank you.

Topic		Replies	Views
Figure Friday 2025 - week 40 Dash Python figure-friday	15	200	October 10, 2025
Figure Friday 2025 - week 35 Dash Python announcements , figure-friday	20	322	September 5, 2025
Figure Friday 2025 - week 43 Dash Python announcements , figure-friday	16	300	November 3, 2025
Figure Friday 2025 - week 20 Dash Python announcements , figure-friday	23	387	May 23, 2025
Figure Friday 2025 - week 18 Dash Python announcements , figure-friday	20	370	May 9, 2025