Figure Friday 2025 - week 40

join the Figure Friday session on October 10, at noon Eastern Time, to showcase your creation and receive feedback from the community.

What NYC schools do better on their math tests, why?

Answer this question and a few others by using Plotly on the Math Test Results dataset.

Things to consider:

  • what can you improve in the app or sample figure below (heatmap)?
  • would you like to tell a different data story using a Dash app?
  • how can you explore the data with Plotly Studio?

Sample figure:
Thank you @Ester for the heatmap and code.

Code for sample figure:
import dash
from dash import dcc, html, Input, Output
import plotly.express as px
import pandas as pd

# Load data
# Instructions to download CSV sheet: https://github.com/plotly/Figure-Friday/blob/main/2025/week-40/data.md
df = pd.read_csv("Math_Test.csv")

# Filter only "All Students"
df = df[df["Student Category"] == "All Students"].copy()

# Convert percentage to numeric
df["Pct Level 3 and 4"] = pd.to_numeric(df["Pct Level 3 and 4"], errors="coerce")

# Initialize app
app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1("Math Test Results – Heatmap of Proficient Students (Level 3 & 4)"),

    html.Div([
        html.Label("Select Geographic Subdivision:"),
        dcc.Dropdown(
            id="geo-dropdown",
            options=[{"label": g, "value": g} for g in df["Geographic Subdivision"].unique()],
            value="Citywide",  # default
            clearable=False,
            style={"width": "250px"}  
        ),
    ], style={"margin-bottom": "20px"}),

    dcc.Graph(id="heatmap")
])

@app.callback(
    Output("heatmap", "figure"),
    Input("geo-dropdown", "value")
)
def update_heatmap(selected_geo):
    # Filter by selected geography
    dff = df[df["Geographic Subdivision"] == selected_geo]

    # Pivot table for heatmap
    pivot = dff.pivot_table(
        index="Grade",
        columns="Year",
        values="Pct Level 3 and 4",
        aggfunc="mean"
    )

    fig = px.imshow(
        pivot,
        aspect="auto",
        color_continuous_scale="Viridis",
        labels={"color": "% Level 3 & 4"},
        title=f"Proficiency Rates (Level 3 & 4) – {selected_geo}"
    )

    return fig

if __name__ == "__main__":
    app.run(debug=True)

For community members that would like to build the data app with Plotly Studio, simply go to the downloads page to download Plotly Studio.

Below is a screenshot of the app that Plotly Studio created on top of this dataset:

Bar Chart prompt:

Proficiency level distribution as a stacked bar chart with controls to group by report category (School, District, Charter School, Borough, Citywide) and filter by year (2013-2023)

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Dash or Plotly Studio.
  • Submit - post your creation to LinkedIn or X with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to NYC Open Data portal for the data.

Hello FF Community,

For this week’s FF 40, the question isr: What NYC schools do better on their math tests, and why?

To tackle this, I started thinking about how we could get a broader view of school quality, not just focusing on a single number. I built this dashboard to help us find schools that are not just good , but consistently and equitably excellent over time.

The Core Idea: My Unique Quality Score (UQS)

The dashboard’s goal is to be a “quality compass” for schools. Instead of just focusing on raw Performance(the percentage of students who pass), I created the Unique Quality Score (UQS) which brings three critical factors together:

  1. Performance (50% of UQS): This is the core metric—the school’s overall passing percentage.
  2. Consistency (30% of UQS): Measures the school’s reliability. Are the results stable year after year, or do they fluctuate dramatically? Schools with high consistency are those you can predict will succeed.
  3. Equity (20% of UQS): Measures the school’s fairness . Is success is evenly distributed among all student groups? Equitable schools have small internal achievement gaps.

How You Can Explore the Data:

The main chart shows you the “map” of the schools based on their Performance and their UQS. The schools further to the right are the most comprehensive ones. You can click on any school to see an immediate and detailed analysis:

  • What is its final UQS score?
  • How does it directly compare to the leading school (the one with the highest UQS)?

Thus, we can say that the best school is not just the one with the highest passing percentage, but the one that maintains that level of success in a stable and fair way for all its students.

I want to be upfront: the Unique Quality Score (UQS) is definitely not perfect . No single score can truly capture the complexity and dedication of a school.

Any questions/doubts/ more than welcome


No idea if I’m going to finish this FF but I concentrated on the parameters ELL, SDW, Gender and economic background and roughly all ended up as I expected except ELL. Warning, I was only doing data exploration to see if I could find something interesting.

What is ELL

In the context of math tests, ELL stands for English Language Learner (also sometimes called English Learner or EL).

What it means:

  • Students whose first language is not English and who are still developing English proficiency

  • These students may face additional challenges on math tests due to language barriers, even though math concepts themselves are often universal

Why it’s relevant for math testing:

  • Word problems: Math tests often contain word problems that require reading comprehension

  • Test instructions: Understanding directions and test format requires English proficiency

  • Mathematical vocabulary: Terms like “quotient,” “perimeter,” “equivalent” may be unfamiliar

  • Cultural context: Some problems may reference concepts unfamiliar to students from different cultural backgrounds

Common accommodations for ELL students on math tests:

  • Extended time

  • Bilingual dictionaries (English/native language)

  • Translated test versions

  • Simplified English instructions

  • Oral interpretation of directions

  • Separate testing environment

In data analysis:
You’ll often see ELL as a demographic category in educational datasets, allowing researchers and educators to:

  • Track achievement gaps

  • Evaluate the effectiveness of ELL programs

  • Ensure equitable assessment practices

  • Monitor progress of this student population

This is likely what you’re seeing if you’re working with educational or standardized test data that includes demographic breakdowns.

I’m fascinated by this image regarding ELL, question why does Ever ELL always score better than never ELL, who did they test etcetcetc:

And this one makes me sad, the one on SWD:

What is SDW

In the context of educational testing, SWD stands for Students with Disabilities.

What it means:

  • Students who have been identified as having one or more disabilities under federal laws like IDEA (Individuals with Disabilities Education Act)

  • These students typically have an IEP (Individualized Education Program) or 504 plan

Types of disabilities that might be included:

  • Learning disabilities (dyslexia, dyscalculia, etc.)

  • Intellectual disabilities

  • Autism spectrum disorders

  • Physical disabilities

  • Sensory impairments (visual, hearing)

  • Emotional/behavioral disorders

  • ADHD

  • Speech/language impairments

Why it’s relevant for math testing:

  • Processing differences: May affect how students understand and work through problems

  • Memory challenges: Difficulty retaining multi-step procedures

  • Attention issues: Trouble focusing on lengthy tests

  • Motor skills: May affect writing/drawing geometric figures

  • Reading disabilities: Can impact word problems even in math

Common accommodations for SWD on math tests:

  • Extended time

  • Frequent breaks

  • Small group or individual testing

  • Use of calculator (when appropriate)

  • Large print materials

  • Read-aloud accommodations

  • Scribe services

  • Alternative response formats

In educational data:
Like ELL, SWD appears as a demographic subgroup in testing data to:

  • Monitor achievement gaps

  • Ensure compliance with federal laws

  • Evaluate special education programs

  • Track progress toward graduation requirements

  • Inform policy decisions about inclusive practices

This demographic category helps educators ensure equitable access to curriculum and fair assessment practices.

Hi Marianne,

I like the approach you take,

Over 33% of SWD are red - below basic levels :frowning: That’s really unfortunate.

@Avacsiglo21 your usage of emojis and icons is spectacular.
One thing that would be interesting is the correlation between school performance and student background, such as socioeconomic status and race. NYC is know to have a pretty segregated school system, which would be interesting to see as a visualization. But we would need to search for that data through an LLM or an API.

Update: here is some demographic data from the NYC portal.

@adamschroeder & @Avacsiglo21

It wouldn’t be me if I had not a) downloaded the results from 2006 onwards and b) also asked AI what changed between 2009-2010 and 2012 and 2013.

Changes 2009 in math test

In 2009, New York State (including NYC) implemented significant changes to their math testing system that could explain shifts in your data:

Major Changes in 2009:

1. Revised Cut Scores (Proficiency Thresholds)

  • NY State recalibrated the score ranges that determine Level 1-4 classifications

  • Made the tests more rigorous by raising the bar for proficiency

  • This typically resulted in lower proficiency rates across the board

2. Test Format Changes

  • Introduced more complex, multi-step problems

  • Increased emphasis on mathematical reasoning and explanation

  • Added more open-ended response questions requiring written explanations

3. Alignment with Revised Standards

  • Tests were realigned with updated New York State Learning Standards

  • Greater emphasis on problem-solving strategies and mathematical communication

4. Scoring Changes

  • Modified rubrics for open-ended questions

  • Stricter scoring criteria for partial credit

Impact on Your Data:

This could explain why you might see:

  • Sudden drops in proficiency rates after 2009

  • Changes in demographic group performance patterns

  • Discontinuities in year-over-year trends

Changes 2013 in math test

In 2013, New York City’s math test changed because it was aligned with the new Common Core State Standards, shifting the focus from memorization to deep analysis and problem-solving. This made the test more rigorous and resulted in a drastic drop in student scores, although proficiency rates began to increase again in the following years.

Key Changes in the Math Test:

  • Common Core Alignment: The tests were redesigned to assess student understanding of the more rigorous New York P-12 Common Core Learning Standards (CCLS).

  • Emphasis on Deeper Understanding: The new standards and tests emphasized deep analytical thinking and creative problem-solving, rather than short answers and memorization.

  • Rigor: The tests became more challenging to ensure students were prepared for college and careers.

Impact of the Change:

  • Score Decline: A sharp decline in student performance was observed in 2013, with only 30% of students in grades 3-8 passing the math exam.

  • Baseline for Future Progress: This initial drop in scores was expected as the state transitioned to higher standards and served as a new baseline for future performance.

  • Long-Term Improvement: Following this initial dip, math proficiency rates in New York City began to rise again from 2013 to 2018, indicating the standards were starting to have a positive impact over time.

How did the SWD score look in this period:

Unfortunately I was not able to map ELL in a way that I would say that the viz would reflect numbers the same way as 2013 onwards.

But based on these results I wonder what is exactly measured, how and why. What I mean by that is math tests in the Netherlands these days are a bunch of stories and you have to be language proficient (enough) and fast enough to extract the queston and answer it and process enough questions to score a nice level. Meaning if you’re not language proficient enough or somehow read slower than others, you have a problem. Either to answer the question or answer enough questions in time to score a nice level.

Ok, the numbers made me think about exactly that and wonder if it is a problem.

Totally Agree @adamschroeder, and also to make clusters, thanks for sharing the link

The information about the changes in the Math Test is excellent. You made a good point regarding what’s exactly measured (how and why); that’s one of the most important skills in data science: understanding the domain, doing research, and making assumptions.

This morning there was a “now what“ moment based on the previous images I shared and I decided to try to find out why Plotly Studio has so many difficulties running on my laptop.

It was the virusscanner, I use Avast Premium and to be more specifc it was the setting Core Shields - File shield. For one hour and 10 minutes I had the full PS experience and the moment the file shield was automatically enabled again the problems returned. Rest of my setup Windows 11 home edition.

Anyway this hour where I fed PS my already filtered smaller .csv resulted in this app:

Why did it take me so long, the app creation? The final app differs a lot from what was originally created. And now there is another “now what“ question beaming on the horizon :sweat_smile:

What a cool chart, @marieanne .
It’s clear how mean scale score will differ according to student category. And by playing around with the dropdown it seems like the higher the grade the larger the gap between SWD and non-SWD.

Hi @adamschroeder

I wanted to try Plotly Studio but am going in circles. I followed these instructions in the docs:

I created a Plotly Cloud account, logged in all seems good but can’t find where to use Plotly Studio.

I went here: Plotly Studio | Agentic Analytics and clicked ‘Login’, I enter my email, get the code, enter it and it brings me back to the main Plotly Cloud page but I can’t find anything about Plotly Studio.

I understand I cannot ‘Download’ and use it locally as I use Linux. However, I got the impression there was a Plotly Studio framework running (e.g. SaaS application). Maybe I missed it somewhere? I tried to upload one of my ‘.py’ apps but it did not allow me (visible, light gray but not clickable).

Am I doing something incorrectly?

Thanks,

Mike

Hi all, I don’t have time to do the task right now, but let’s assume that the first example (heatmap) is my solution.:slightly_smiling_face:

Hi @mike_finko: Plotly Studio runs well as a PC app under Win11. Sounds like there is a Linux issue, and no surprise to me if these early versions of PS do not support all platforms. Hope this gets resolved.

I am swamped this week, so unable to contribute a dash board or join the meeting 10 minutes from now. I did spend time look at submissions from @Avacsiglo21 and @marieanne and just want to say both of them are brilliant and a joy to use. Have a great weekend everyone.

Thank you for letting us know, @Mike_Purtell . Have a good weekend as well.