Figure Friday 2025 - week 1

join the Figure Friday session on January 10, at noon Eastern Time, to showcase your creation and receive feedback from the community.

Welcome to the first week of Figure Friday 2025 :tada:
This week we’ll look at the results of the NYC Marathon that took place in November 2024. Data includes runner’s name, age, gender, pace, final time, and much more.

Download data:

  • Go to Joe Hovde’s google sheet and download it as a CSV sheet. Click File → Download → Comma Separate Values
  • Save the CSV sheet in the same directory as the Python code provided (under the sample figure), and run code.

Things to consider:

  • can you improve the sample figure below (violin plot)?
  • would you like to tell a different data story using a different graph?
  • can you create a Dash app instead?

Sample figure:

Code for sample figure:
import plotly.express as px
import pandas as pd

df = pd.read_csv('NYC Marathon Results, 2024 - Marathon Runner Results.csv')

# Convert `pace` column from string format (minutes:seconds) to numeric (float) in minutes
def convert_pace_to_minutes(pace_str):
    try:
        minutes, seconds = map(int, pace_str.split(':'))
        return minutes + seconds / 60
    except ValueError:
        return None

# Apply conversion to the `pace` column
df['pace_minutes'] = df['pace'].apply(convert_pace_to_minutes)

# Drop rows where `pace_minutes` could not be calculated
cleaned_data = df.dropna(subset=['pace_minutes'])

# Define age groups
bins = [10, 20, 30, 40, 50, 60, 70, 80, 90]
labels = ['10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80', '80-90']

# Create a new column for age groups
cleaned_data['age_group'] = pd.cut(cleaned_data['age'], bins=bins, labels=labels, right=False)

fig = px.violin(
    cleaned_data,
    x='age_group',
    y='pace_minutes',
    title='Distribution of Minutes per Mile, by Age Group',
    labels={'pace_minutes': 'Pace (minutes per mile)', 'age': 'Age'},
    box=True
)
fig.update_xaxes(categoryorder='array', categoryarray=labels)

fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Data on the November 2024 marathon results were scraped by researcher & data analyst Joe Hovde from the NYC Marathon Official results page.

6 Likes

The dataset seems interesting, Wait for my Dash app😃

2 Likes

I have never used violin plot chart. It is very nice :star_struck:and interesting, so I am learning it.

1 Like

I searched a bit and didn’t find the answer. racesCount, is this about number of marathons in general or NYC marathon specifically? If anybody has an idea, it would be appreciated.

edit: until I did max, it’s marathons in general, not sure if it’s before the NYCM or including the NYCM

1 Like

I will check it today.

1 Like

@marieanne I found that after 4 attempts, Adbi Nageye won the NYC marathon, so that’s why I think it’s her 4th race overall.

3 Likes

Happy New Year, Plotly Community! :tada:

For the first Figure Friday of 2025, I’m excited to share my interactive Dash app analyzing NYC Marathon Results. Kaggle link and Github link

.

8 Likes

I think, the racesCount is the number of races (not only marathons?) a runner participated in registered by NYRR.org or in the USA or something like that. If you look at the highest numbers it’s almost all USA citizens. If you reverse it a bit and look at people aged 60, 54% have a runnercount of 1 and almost the same percentage is not from the USA. I’m exactly 60 and live in the Netherlands, I would not travel to the USA to run my first marathon, I would exercise a bit (a lot) and be almost sure I would finish, before I … I think it’s an official USA or NYRR count.

3 Likes

I created this for the Figure Friday. The first thought I had with this dataset was “bucketlist”, the plot suggests… :sweat_smile:
If you enter your age, you get a customized age profile. Images are AI generated with freepik. Not that fancy that you get an age appropriate image in the custom profile section. Just 4 images.

Demo on py.cafe:

The code is on github, it looks like I maxed on py.cafe, I will try it again later.

The discussion about the racesCount is somewhere above. I changed a few descriptions a bit, not waterproof. I did a few assumptions about raceCounter, it’s about official (whatever that may mean, organized by, registered somewhere) races in the USA and a race does not have to be a marathon. I was looking at some numbers and thought "“if these were all marathons your more than wonderhuman”. I could improve the tooltips and some stuff, time is up.

8 Likes

Hi Everyone,

I was trying to convert the violin plot chart into the dash app, I was mainly interested in the years and how many people participated. :slightly_smiling_face:

4 Likes

nice figures, @feanor_92 . For me the bar chart was the clearest and most revealing graph.

With the scatter plot, it was hard to see if there were blue markers under the red ones.

2 Likes

These are awesome stats, @marieanne . Thanks for sharing.

It was surprising for me to see how there was a jump in number of runners around the 40 year-old mark and the 50 year-old mark.

1 Like

That’s a beautiful app, @Ester . One single violin chart can be so informative.

Are you able to share the code with us? Is this on py.cafe?

1 Like

I am glad. Well, I’ll think about it a little more until tomorrow, because I wanted to put something else on it, but I haven’t managed to do it yet.

1 Like

Thank you and yes, 60 too more or less. It was an accident. The bucket list thought was because in NL lots of people have a marathon in New York or Boston on their list. This idea usually comes up when they are somewhat older and have a form like “when I’m 40,50,60 I want…”. A just one time wish. And I was actually playing around a bit and had a “look at that” moment.

3 Likes

:clap: Beautiful apps by now!! It is obvious that women are faster than men! :star_struck:
In the meantime I’m struggling to bring an annotations to the front, I couldn’t figure it out yet, so any help would be appreciated. So, here is the question: Have you ever think about how is the time-curve of the race time?

These are the two lines that I used to parse the race_time col.: (the rest code you have it)

race_time=pd.to_datetime(data['overallTime'], format='%H:%M:%S')
data['race_time'] = race_time.dt.hour + race_time.dt.minute / 60 + race_time.dt.second / 3600

And this is how I tried this line-chart:

Code Line-chart
rt_data = pd.DataFrame(data['race_time'].describe()).T
rt_data.drop(labels=['count', 'std','min','max'], axis=1, inplace=True)
# rt_data
pct25_idx = int(pd.Series(data[data['race_time'] == 3.8].index).median()) # 13544
ticks_vals = sorted(rt_data.iloc[0].round(2).to_list())
# ticks_vals
fig1 = px.scatter(
    data,
    x='overallPlace',
    y='race_time',
    height=700,
)
fig1.update_yaxes(showticklabels=True)
fig1.update_yaxes(
    tickmode='array',
    tickvals=[3.81, 4.39, 4.53, 5.09],
    ticktext=['25%', '50%', 'mean', '75%']
    )
fig1.add_annotation(x=pct25_idx, y=3.8,
            text="Whatever ",
            # xshift=150,
            yshift=-10,
            showarrow=False,
            )
fig1.add_trace(go.Scatter(
    x=[pct25_idx],
    y=[3.8],
    mode='markers+text',
    marker_color='red',
    marker_symbol='diamond',
    text='25%<br>3.81hrs',
    textposition='top left')
)

fig1
6 Likes

It took me many tries to get it to work. Things work differently now :slight_smile:

4 Likes

hi @JuanG
What do you mean bringing the annotation the front? I see the whatever annotation. What would it look like if you brought it to the front?

1 Like

Nice job, @Ester . When hovering over the violin charts, is the hover data referring to the orange (x gender) only or to all genders?

1 Like