Figure Friday 2024 - week 32

adamschroeder · August 9, 2024, 3:16pm

Update : Figure Friday 2024 - week 33 is the newer dataset.

Week 32 of the Figure Friday initiative will focus on the Gender Pay Gap in Ireland.

Jennifer Keane collected data from 2022-2023 to report on pay differences for men versus women among companies located in Ireland. More on the data and project can be found at the Irish Gender Pay Gap Portal and at the project GitHub.

Sample Figure:

Code for sample figure:

import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-32/irish-pay-gap.csv')


# Filter the data for certain companies
df_filtered = df[df['Company Name'].isin(['3M', 'AA', 'Abbott Ireland (all legal entities)', 'AbbVie Sligo' 'AbbVie Westport (Allergan)',
                                          'ABP Food Group: C&D Foods', 'ABP Food Group: Irish Country Meats', 'Abtran',
                                          'Google Ireland Limited', 'Meta Ireland (all legal entities)'])]

# Get the list of unique companies
companies = df_filtered['Company Name'].unique()

# Prepare data for the dumbbell plot
data = {"line_x": [], "line_y": [], "2022": [], "2023": [], "companies": []}

for company in companies:
    data["companies"].append(company)
    mean_gap_2022 = df_filtered.loc[
        (df_filtered['Report Year'] == 2022) & (df_filtered['Company Name'] == company), 'Mean Hourly Gap'].values
    mean_gap_2023 = df_filtered.loc[
        (df_filtered['Report Year'] == 2023) & (df_filtered['Company Name'] == company), 'Mean Hourly Gap'].values

    if len(mean_gap_2022) > 0 and len(mean_gap_2023) > 0:
        data["2022"].append(mean_gap_2022[0])
        data["2023"].append(mean_gap_2023[0])
        data["line_x"].extend([mean_gap_2022[0], mean_gap_2023[0], None])
        data["line_y"].extend([company, company, None])

# Create the dumbbell plot
fig = go.Figure(
    data=[
        go.Scatter(
            x=data["line_x"],
            y=data["line_y"],
            mode="lines",
            showlegend=False,
            marker=dict(color="grey")
        ),
        go.Scatter(
            x=data["2022"],
            y=data["companies"],
            mode="markers",
            name="2022",
            marker=dict(color="green", size=10)
        ),
        go.Scatter(
            x=data["2023"],
            y=data["companies"],
            mode="markers",
            name="2023",
            marker=dict(color="blue", size=10)
        ),
    ]
)

fig.update_layout(
    title="Mean Hourly Gap by Company: 2022 vs 2023",
    height=1000,
    legend_itemclick=False,
    xaxis_title="Mean Hourly Gap",
    yaxis_title="Company"
)

fig.show()

Dumbbells in Plotly docs.

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to the Irish Gender Pay Gap Portal for the data.

Mike_Purtell · August 10, 2024, 5:06am

Update Aug 13: Used gender based colors from Telegraph 2018. Big thanks to @li.nguyen for wonderful guidance on this topic

Here is percentage by gender in each income quartile. Sadly this pattern is what I expected. I will dig further to see if there is any discernable shift by year.

import polars as pl
import polars.selectors as cs
import plotly.express as px
import numpy as np

# gender-based colors from Telegraph, 2018
color_men = '#1FC3AA'
color_women = '#8624F5'
gender_color_dict = {'male' : color_men, 'female': color_women}


df = pl.read_csv('gpg.csv', ignore_errors = True)

def add_annotation(fig, annotation, x, y, align, xanchor, yanchor, xref='paper', yref='paper', xshift=0):
    ''' Generic function to place text on plotly figures '''
    fig.add_annotation(
        text=annotation,
        xref = xref, x=x, yref = yref, y=y,
        align= align, xanchor=xanchor, yanchor=yanchor,
        font =  {'size': 12, 'color': 'darkslategray'},
        showarrow=False,
        xshift = xshift
    )
    return fig

def tweak_quantiles(df):
    ''' Extract gender percentages of 4 salary quantiles  '''
    return(
        df
        .select(pl.col('pb1Female', 'pb1Male','pb2Female', 'pb2Male','pb3Female', 'pb3Male','pb4Female', 'pb4Male'))
        .unpivot(
            variable_name='Cat',
            value_name='Percent'
        )
        .filter(pl.col('Percent') != 'NULL')
        .with_columns(pl.col('Percent').cast(pl.Float32))
        .with_columns(Gender = pl.col('Cat').str.slice(3))
        .with_columns(
            Enum_Quartile = (
                pl.col('Cat')
                .str.replace('pb1Male', 'Q1')
                .str.replace('pb1Female', 'Q1')
                .str.replace('pb2Male', 'Q2')
                .str.replace('pb2Female', 'Q2')
                .str.replace('pb3Male', 'Q3')
                .str.replace('pb3Female', 'Q3')
                .str.replace('pb4Male', 'Q4')
                .str.replace('pb4Female', 'Q4')
            )
        )
        .group_by('Gender', 'Enum_Quartile').agg(pl.col('Percent').mean())
        .pivot(
            on = 'Gender',
            index='Enum_Quartile'
        )
        .with_columns(Quartile = pl.col('Enum_Quartile').str.slice(1).cast(pl.UInt8))
        .sort('Enum_Quartile', descending=False)
    )

#------------------------------------------------------------------------------#
#     Plot Gender proportion of 4 salary quantiles                             #
#------------------------------------------------------------------------------#
df_quantiles = tweak_quantiles(df)
fig=px.scatter(df_quantiles, x='Enum_Quartile', y=['Male', 'Female'], 
               color_discrete_sequence=[color_men, color_women])
print (df_quantiles)
fig.update_layout(
    title = f'Irish Gender Gap',
    height=600, width=800,
    xaxis_title='Income Quartile: Q1 is lowest, Q4 is highest'.upper(),
    yaxis_title='Avg % of employees per income quantile'.upper(),
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)
#  Setup hover elements
fig.update_traces(
    mode='markers+lines',
    marker=dict(size=12, line=dict(width=0)),
    )

fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

annotation = f'<b><span style="color:{color_men}">MEN</b></span>'
fig = add_annotation(
    fig, 
    annotation, 
    df_quantiles['Enum_Quartile'][-1],  # x
    df_quantiles['Male'][-1]-1,                   # y
    'left', 
    'left', 
    'middle',
    xref = 'x', 
    yref = 'y', 
    # xshift=10
    )

annotation = f'<b><span style="color:{color_women}">WOMEN</b></span>'
fig = add_annotation(
    fig, 
    annotation, 
    df_quantiles['Enum_Quartile'][-1],  # x
    df_quantiles['Female'][-1]+1,                   # y
    'left', 
    'left', 
    'middle',
    xref = 'x', 
    yref = 'y', 
    )

annotation = '<b>Data Source:</b> Irish Gender Pay Gap Portal (http://paygap.ie)<br><br>'
annotation += "Average percentages of participating companies<br>"
annotation += "are <b>not weighted</b> by number of responses"
fig = add_annotation(fig, annotation, 0.05, 1.0, 'left', 'left', 'top')

w = f'<b><span style="color:{color_women}">women</b></span>'
m = f'<b><span style="color:{color_men}">men</b></span>'
annotation = f'More {w} than {m} in the lowest income quartile,<br>'
annotation += f'50% more {m} than {w} in the highest income quartile<br>'
fig = add_annotation(fig, annotation, 0.4, 0.5, 'left', 'left', 'middle')

fig.show()

adamschroeder · August 12, 2024, 1:19pm

Very telling graph, @Mike_Purtell . Thanks for sharing here on the forum, in addition to Discord.

@li.nguyen wrote a great post on color and gender bias, as part of a conversation that came out of this visualization.

Mike_Purtell · August 12, 2024, 2:35pm

Thanks @adamschroeder . I used @li.nguyen 's recommendation on gender-based colors and am blown away her generosity & thoughtfulness.

li.nguyen · August 12, 2024, 2:54pm

This looks amazing @Mike_Purtell! Super tidy and insightful, and I agree with you - no need for a legend given your color encoding in the text. Always happy to help!

Alfredo49 · August 12, 2024, 3:46pm

Hi people,

Following on the line of comparing metrics by year I created Dash app with box plot and histogram charts to visualize the distribution by year for each pay gap metric with a select filter.

Live App
Source Code

adamschroeder · August 12, 2024, 4:40pm

nice app @Alfredo49 .
I like how you used the historgram. Combined with the box plot, it’s easier to see the difference between the two years. Generally, it looks like the pay gaps had small declines in 2023 compared to 2022.

li.nguyen · August 14, 2024, 3:16pm

I’ve recently started re-reading a book that I also recommended to @adamschroeder , and it remains one of my favorite data visualization books for anyone interested: https://www.storytellingwithdata.com/

Here are a few principles from the book that I applied:

I used color to highlight 2023, while keeping 2022 in grey, to focus on the recent year and the development from 2022 to 2023.
I tried to reduce cognitive load e.g. by:
- replacing the terms Q1-Q4 with more intuitive names (low, lower-middle, upper-middle and high), as they could be mistaken for quarters otherwise
- adding a reference line for equal gender representation (50%) to visually guide users in the analysis of identifying unequal gender representation
I removed any visual clutter (can’t take all the credit here, as many of these principles are already incorporated into the Vizro Plotly chart template), e.g. such as removing unnecessary grid lines and legend titles, and making the grid lines more subtle.

In summary, I found that, there has been no significant progress in female representation from 2022 to 2023 if you take the average across all companies. On average, female representation remains lower across all pay quartiles, with the gender gap widening in the upper-middle and highest quartiles, as expected.

Additionally, I created an interactive app that allows you to filter by specific companies. The drill-down into a single company can provide a different picture. Little sneak peak below

figure-friday-week-32

App: PyCafe - Vizro - Irish Gender Pay Gap
App/Code side-by-side: PyCafe - Vizro - Irish Gender Pay Gap

AnnMarieW · August 14, 2024, 3:53pm

My app is designed for job seekers, employees, investors and researchers interested in detailed gender pay gap information for each company. The data is consistently presented for all companies, with links to company-prepared reports that typically explain the factors driving pay gap differences.

I use Ryanair as the default example because, while it’s immediately apparent that the upper pay quartile is dominated by men, the company’s report shows why. The top pay category is primarily occupied by pilots—a well-paid role with equal pay for men and women. The real mystery to me, as a pilot myself, is why are there so few women pilots?

I also focused on individual companies rather than the aggregate. I was concerned that using the average or mean of all the companies might be misleading since it gives equal weighting to all companies regardless of the number of employees.

GitHub

Live app on Pythonanywhere

Editable code and app on PyCafe

li.nguyen · August 14, 2024, 4:12pm

Wow, @AnnMarieW - this is amazing! Love the layout of the report which allows for all kind of drill-down analysis. I think you should actually share this app with Jennifer Keane who maintains that data base. I am sure she’ll love it as well!

AnnMarieW · August 14, 2024, 4:20pm

Hi @li.nguyen

Thanks for your kind words
I like your app too - especially how you can see a trend over time for a selected company. That will be even more valuable going forward as more data becomes available.

Maybe we need to go create a multi page app of with all the visualizations from this week’s challenge for Jennifer Keane

adamschroeder · August 14, 2024, 5:50pm

Thank you for sharing, @li.nguyen . I like how we always learn good data viz techniques with you.
I found the 50% line especially useful because helps remind the viewer of the point all companies should strive to be at.

p.s. I see you’ve started using the assets folder at py.cafe

li.nguyen · August 14, 2024, 9:46pm

Yes! Previously it was not possible to upload a folder to py.cafe, but they have just enabled that feature! So we can add the assets folder now! Kudos to @maartenbreddels and his team

maartenbreddels · August 15, 2024, 12:49pm

Great indeed to see the use of folders at PyCafe!

mo.elauzei · August 15, 2024, 9:50pm

Hi,

I was curious whether a higher ratio of female to male employees would lead to more equitable pay practices. I plotted the percentage of female employees against the mean and median hourly pay gap, expecting to see a strong inverse relationship. However, the results showed only a weak correlation.

Notably, the median gap is generally less severe than the mean gap, which confirms the observation that many have already pointed out with regard to the male employees being overrepresented in the top pay grades.

FF32

Source Code:

import dash
from dash import html, dcc, Output, Input
import plotly.express as px
import pandas as pd
import dash_mantine_components as dmc
import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template


df = pd.read_csv('irish-pay-gap.csv')
df = df.loc[df['Report Year'] == 2023]

load_figure_template("bootstrap")
app = dash.Dash(
    __name__, suppress_callback_exceptions=True,
    external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.BOOTSTRAP]
)

app.layout = dbc.Container(
    [
        html.H5('Exploring the Link Between Female Workforce Representation and Gender Pay Disparity in Ireland', className='my-2'),
        dmc.Select(
            label="Select measure",
            id="measure",
            value="Mean Hourly Gap",
            data=['Mean Hourly Gap', 'Median Hourly Gap'],
            style={'width' : '300px'},
            className='mb-0'
        ),
        dcc.Graph(id='scatter'),
        html.Br(),
        html.H6("Findings: Higher female workforce participation minimally impacts pay disparity. "
                "The median gap is more tightly clustered around zero than the mean, "
                "indicating greater disparity at higher pay levels within companies.")
        ]
)

@app.callback(Output('scatter', 'figure'), Input('measure', 'value'))
def update_graph(value):
    dff = df.copy()
    fig = px.scatter(dff, x="Percentage Employees Female", y=value, hover_name='Company Name', trendline="ols")
    fig.update_xaxes(title_text="% Female Employees in The Company")
    return fig


if __name__ == "__main__":
    app.run_server(debug=True)```

li.nguyen · August 16, 2024, 8:09am

Hey @mo.elauzei, beautiful and tidy chart!

Some minor suggestions:

Add the unit (%) to the y-axis title, similar to what you did for the x-axis title, or add it as a suffix to the numbers. This will make the data clearer for people who are not familiar with the data set.
Include the correlation factor as an annotation next to the OLS reference line.

Since your goal is to evaluate the correlation, providing this information directly to the chart can make the chart more user-friendly and informative, as users don’t have to hover over it for this information!

adamschroeder · August 16, 2024, 1:07pm

Hi @mo.elauzei
I like how you went about this. You had a hypothesis and you built an app to confirm or disprove it

Can you please explain why a mean gap that is higher than a median gap confirms the observation that male employees are over-represented in the top pay grades. It feels like a right observation, but mathematically I’m having a hard time connecting the dots.

mo.elauzei · August 16, 2024, 4:38pm

Super helpful feedback as usual, @li.nguyen !
I agree with both points you made.

mo.elauzei · August 16, 2024, 4:44pm

You’re right, @adamschroeder . I should’ve said ‘support’ rather than ‘confirm’ since I didn’t prove that mathematically in this case.

Topic		Replies	Views
Figure Friday 2024 - week 33 Dash Python figure-friday	35	355	August 24, 2024
Figure Friday 2024 - week 31 Dash Python figure-friday	19	347	August 10, 2024
Figure Friday 2024 - week 51 Dash Python announcements , figure-friday	20	219	December 27, 2024
Figure Friday 2024 - week 38 Dash Python figure-friday	20	280	September 30, 2024
Figure Friday 2024 - week 29 Dash Python announcements , figure-friday	74	882	July 30, 2024

Figure Friday 2024 - week 32

Participation Instructions:

Related topics