Figure Friday 2024 - week 31

Update : Figure Friday 2024 - week 32 is the newer dataset.

It’s week 31 of the Figure Friday initiative and it’s time to explore the 2023 Stack Overflow’s Annual Developer Survey.

This will prepare us for the 2024 survey that we plan to explore in a future Figure Friday session, once Stack Overflow releases the data in a CSV format.

To download the 2023 survey results, simply click the Download Full Data Set (CSV).

Sample Figure:

Code for sample figure:
import plotly.express as px
import pandas as pd
df = pd.read_csv("survey_results_public.csv")
df_filtered = df[df['ConvertedCompYearly'] < 200000]
df_filtered = df_filtered[df_filtered['WorkExp'] < 21]

fig = px.density_heatmap(df_filtered, x="WorkExp", y="ConvertedCompYearly", text_auto=True,
                         labels={"ConvertedCompYearly":"Annual Compensation (USD)"})
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to the Stack Overflow for the data.

3 Likes

Update: Added a second plot.

AI Favorabilty, Age Bias (px.bar) uses rounded bars, a nice feature released this year with plotly 5.19

Years Professional Coding Experience (px.line) has countries with 1000 or more survey responses. A px.scatter is superimposed on this plot for emphasis of selected points.

Used a new library to me, pycountry, for mapping of country names to generally accepted abbreviations. This relieves crowding of x-axis tick labels.

import polars as pl
import plotly.express as px
import pycountry

#  Functions
def add_annotation(fig, annotation, x, y, align, xanchor, yanchor):
    fig.add_annotation(
        text=annotation,
        xref = 'paper', x=x, yref = 'paper', y=y,
        align= align, xanchor=xanchor, yanchor=yanchor,
        font =  {'size': 12, 'color': 'darkslategray'},
        showarrow=False
    )
    return fig

#  load dataset to dataframe df_csv
df_csv = pl.read_csv(
    f'Dataset/survey_results_public.csv',
    ignore_errors=True
)

#  create dictionary with countrys as keys, abbreviations as values
dict_countries = {}
for item in pycountry.countries:
    dict_countries[item.name] = item.alpha_3

#------------------------------------------------------------------------------#
#     AI bias by age group                                                     #
#------------------------------------------------------------------------------#
df_ai_age_pct =  (  # create and process dataframe
    df_csv.lazy()   # polars lazy frames 
    .select('Age', 'AISent')
    .filter(~pl.col('Age').is_in(['Prefer not to say','NA']))
    .with_columns(
        Age = pl.col('Age')
            .str.replace('Under 18 years old', '-17')
            .str.replace(' years old', '')
            .str.replace('65 years or older', '65-'),
    )
    .with_columns(pl.col('AISent').str.replace('Very favorable','Favorable'))
    .with_columns(pl.col('AISent').str.replace('favorable','Favorable'))
    .with_columns(pl.col('AISent').str.replace('favorable','Favorable'))
    .with_row_index()
    .group_by(
        pl.col('Age', 'AISent'))
        .agg(pl.col("index").count())
        .rename({'index': 'Count'})
    .with_columns(
        PCT = (100 * (pl.col('Count')/pl.col('Count').sum().cast(pl.Float64())).over('Age')),
        TOTAL = (pl.col('Count').sum().cast(pl.Float64())).over('Age')
    )
    .collect()  # collect optimizes creation of datframe from lazyframe  
)

#  Make vertical bar chart
fig = px.bar(
    df_ai_age_pct.sort('Age').filter(pl.col('AISent') == 'Favorable'),
    x='Age', 
    y="PCT", 
    color="Age",
    barmode = 'stack',
    custom_data = ['Age', 'PCT', 'Count', 'TOTAL']
    )

#  Setup hover elements
fig.update_traces(
    hovertemplate = '<br>'.join([
    'Ages %{customdata[0]}',
    '%{customdata[1]:.1f}% Favorable',
    '%{customdata[2]:,d} of %{customdata[3]:,d}',
    '<extra></extra>'
    ])
)

#  Clean up the plot, and display it
fig.update_layout(
    title = 'AI Favorabilty, Age Bias',
    height=400, width=800,
    xaxis_title='Age Group',
    yaxis_title='Favorable type response rate (%)',
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False, range=[25, 60])

fig.update(layout=dict(barcornerradius=10))

#  Add annotations
annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer<br>"
annotation += 'Survey. Favorable type means "Very favorable" or<br>'
annotation += '"Favorable". Age excludes "Prefer not to say" & "NA"<br>'
fig = add_annotation(fig, annotation, 0.5, 1.0, 'left', 'left', 'top')
fig.show()
print('\n\n\n')

#------------------------------------------------------------------------------#
#     Average years of professional coding experience by country
#------------------------------------------------------------------------------#
df_coding_years = (   #  prepare dataframe 
    df_csv
    .select('Country', 'YearsCodePro')
    .filter(~pl.col('YearsCodePro').is_in(['NA']))
    #     Change country names for USA and UK to match values used by pycountry 
    .with_columns(pl.col('Country').str.replace('United States of America', 'United States'))
    .with_columns(pl.col('Country').str.replace('United Kingdom of Great Britain', 'United Kingdom'))
    .with_columns(pl.col('Country').str.replace('United Kingdom and Northern Ireland', 'United Kingdom'))
    .with_columns(pl.col('YearsCodePro').str.replace('Less than 1 year', '0'))
    .with_columns(pl.col('YearsCodePro').str.replace('More than 50 years', '50'))
    .with_columns(pl.col('YearsCodePro').cast(pl.UInt16))
)
#  list countries with 1000 or more survey participants
countries_1k = (
    df_coding_years
    .with_columns(
        Country_Count = pl.col('Country').count().over('Country'),
        )
    .filter(pl.col('Country_Count') > 999)
    .filter(pl.col('Country') != 'Other')
    .unique('Country')
    .sort('Country_Count', descending=True)
    ['Country'].to_list()
)

#  include countries with 1000 or more participants, data between 25th and 75th percentile
df_coding_years = (
    df_coding_years
    .filter(pl.col('Country').is_in(countries_1k))
    .with_columns(
        median_years = pl.col('YearsCodePro').median().over('Country'),
        Q75 = pl.col('YearsCodePro').quantile(0.75).over('Country'),
        Q25 = pl.col('YearsCodePro').quantile(0.25).over('Country')
    )
    .with_columns(country_abbr = (pl.col('Country')).replace(dict_countries))
    .filter(pl.col('YearsCodePro').is_between(pl.col('Q25'),pl.col('Q75')))
    .with_columns(average = (pl.col('YearsCodePro').mean().over('Country')))
    .with_columns(my_text = (pl.col('Country') + pl.lit(': ') + pl.col('average').round(1).cast(pl.String)))
    .sort('median_years')
)
#  Create line plot that shows all companies listed in countries_1k
fig = px.line(
    df_coding_years.sort('average'),
    x='country_abbr',
    y='average',
    custom_data = ['Country', 'average']
    )

fig.update_layout(
    title = '<br>Years Professional Coding Experience<br><sup>Countries with at least 1000 survey responses</sup>',
    height=400, width=800,
    xaxis_title='Country',
    yaxis_title='Years (Average)',
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)

#  Setup hover elements
fig.update_traces(
    mode='markers+lines',
    hovertemplate = '<br>'.join([
    '%{customdata[0]}',
    '%{customdata[1]:.2f} Years',
    '<extra></extra>'
    ])
)
#  add scatter plot to emphasize specific countries.
df_focus = df_coding_years.filter(pl.col('country_abbr').is_in(['IND','CAN','USA','AUS']))
fig = fig.add_traces(
    px.scatter(
        df_focus,
        x='country_abbr', 
        y='average',
        text=df_focus['my_text']
        ).data
)
fig.update_traces(textposition='bottom right')

fig.update_traces(line_color='lightgray', line_width=2, marker_size=10)
 # extend x-axis to include all points and annotations
fig.update_xaxes(range=[-1, len(countries_1k)+2.5]) 
fig.update_xaxes(showgrid=True) 
fig.update_yaxes(showgrid=False)

#------------------------------------------------------------------------------#
#      Annotate                                                                #
#------------------------------------------------------------------------------#
annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer<br>"
annotation += 'Survey. Average includes values between<br>'
annotation += 'quantiles 25 & 75. Arbitrary emphasis on<br>'
annotation += 'min & max averages, and North American countries. '
fig = add_annotation(fig, annotation, 0.2, 0.4, 'left', 'left', 'top')
fig.show()

5 Likes

Nice graph, @Mike_Purtell . Is this the survey question that you were analyzing:

How favorable is your stance on using AI tools as part of your development workflow?

Then you combined the very favorable and favorable answers?

1 Like

Hello @adamschroeder, you are correct, the responses I have analyzed are for the question “How favorable is your stance on using AI tools as part of your development workflow?”. Responses for that question are in the AISent column, which I presume to mean AI Sentiment. Indeed, I merged very favorable and favorable for this analysis. Thank you.

1 Like

Aug 8: Updated ‘Salary by Years of Experience’ plots. This code still generates the plot with all countries combined, and also makes separate plots by country for countries with more than 1000 survey responses

The graphs show years of experience up to 20 years, and salary statistics (median, 25th percentile and 75th percentile). The code makes 2 plots that were posted earlier this week, and added plots of salary by years of experience. I used go.scatter with mode = ‘lines’ , and color fill between traces. Not clear to me if this could be done using px.line or px.scatter (probably doable, didn’t spend enough time to find out)



import polars as pl
import plotly.express as px
import plotly.graph_objects as go
import pycountry

#  Functions
def add_annotation(fig, annotation, x, y, align, xanchor, yanchor, xref='paper', yref='paper', ):
    fig.add_annotation(
        text=annotation,
        xref = xref, x=x, yref = yref, y=y,
        align= align, xanchor=xanchor, yanchor=yanchor,
        font =  {'size': 12, 'color': 'darkslategray'},
        showarrow=False
    )
    return fig

#  load dataset to dataframe df_csv
df_csv = pl.read_csv(
    f'Dataset/survey_results_public.csv',
    ignore_errors=True
)

#  create dictionary with countrys as keys, abbreviations as values
dict_countries = {}
for item in pycountry.countries:
    dict_countries[item.name] = item.alpha_3

#------------------------------------------------------------------------------#
#     AI bias by age group                                                     #
#------------------------------------------------------------------------------#
df_ai_age_pct =  (  # create and process dataframe
    df_csv.lazy()   # polars lazy frames 
    .select('Age', 'AISent')
    .filter(~pl.col('Age').is_in(['Prefer not to say','NA']))
    .with_columns(
        Age = pl.col('Age')
            .str.replace('Under 18 years old', '-17')
            .str.replace(' years old', '')
            .str.replace('65 years or older', '65-'),
    )
    .with_columns(pl.col('AISent').str.replace('Very favorable','Favorable'))
    .with_columns(pl.col('AISent').str.replace('favorable','Favorable'))
    .with_columns(pl.col('AISent').str.replace('favorable','Favorable'))
    .with_row_index()
    .group_by(
        pl.col('Age', 'AISent'))
        .agg(pl.col("index").count())
        .rename({'index': 'Count'})
    .with_columns(
        PCT = (100 * (pl.col('Count')/pl.col('Count').sum().cast(pl.Float64())).over('Age')),
        TOTAL = (pl.col('Count').sum().cast(pl.Float64())).over('Age')
    )
    .collect()  # collect optimizes creation of datframe from lazyframe  
)

#  Make vertical bar chart
fig = px.bar(
    df_ai_age_pct.sort('Age').filter(pl.col('AISent') == 'Favorable'),
    x='Age', 
    y="PCT", 
    color="Age",
    barmode = 'stack',
    custom_data = ['Age', 'PCT', 'Count', 'TOTAL']
    )

#  Setup hover elements
fig.update_traces(
    hovertemplate = '<br>'.join([
    'Ages %{customdata[0]}',
    '%{customdata[1]:.1f}% Favorable',
    '%{customdata[2]:,d} of %{customdata[3]:,d}',
    '<extra></extra>'
    ])
)

#  Clean up the plot, and display it
fig.update_layout(
    title = 'AI Favorabilty, Age Bias',
    height=400, width=800,
    xaxis_title='Age Group',
    yaxis_title='Favorable type response rate (%)',
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False, range=[25, 60])

fig.update(layout=dict(barcornerradius=10))

#  Add annotations
annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer<br>"
annotation += 'Survey. Favorable type means "Very favorable" or<br>'
annotation += '"Favorable". Age excludes "Prefer not to say" & "NA"<br>'
fig = add_annotation(fig, annotation, 0.5, 1.0, 'left', 'left', 'top')
fig.show()
print('\n\n\n')

#------------------------------------------------------------------------------#
#     Average years of professional coding experience by country
#------------------------------------------------------------------------------#
df_coding_years = (   #  prepare dataframe 
    df_csv
    .select('Country', 'YearsCodePro')
    .filter(~pl.col('YearsCodePro').is_in(['NA']))
    #     Change country names for USA and UK to match values used by pycountry 
    .with_columns(pl.col('Country').str.replace('United States of America', 'United States'))
    .with_columns(pl.col('Country').str.replace('United Kingdom of Great Britain', 'United Kingdom'))
    .with_columns(pl.col('Country').str.replace('United Kingdom and Northern Ireland', 'United Kingdom'))
    .with_columns(pl.col('YearsCodePro').str.replace('Less than 1 year', '0'))
    .with_columns(pl.col('YearsCodePro').str.replace('More than 50 years', '50'))
    .with_columns(pl.col('YearsCodePro').cast(pl.UInt16))
)
#  list countries with 1000 or more survey participants
countries_1k = (
    df_coding_years
    .with_columns(
        Country_Count = pl.col('Country').count().over('Country'),
        )
    .filter(pl.col('Country_Count') > 999)
    .filter(pl.col('Country') != 'Other')
    .unique('Country')
    .sort('Country_Count', descending=True)
    ['Country'].to_list()
)

#  include countries with 1000 or more participants, data between 25th and 75th percentile
df_coding_years = (
    df_coding_years
    .filter(pl.col('Country').is_in(countries_1k))
    .with_columns(
        median_years = pl.col('YearsCodePro').median().over('Country'),
        Q75 = pl.col('YearsCodePro').quantile(0.75).over('Country'),
        Q25 = pl.col('YearsCodePro').quantile(0.25).over('Country')
    )
    .with_columns(country_abbr = (pl.col('Country')).replace(dict_countries))
    .filter(pl.col('YearsCodePro').is_between(pl.col('Q25'),pl.col('Q75')))
    .with_columns(average = (pl.col('YearsCodePro').mean().over('Country')))
    .with_columns(my_text = (pl.col('Country') + pl.lit(': ') + pl.col('average').round(1).cast(pl.String)))
    .sort('median_years')
)
#  Create line plot that shows all companies listed in countries_1k
fig = px.line(
    df_coding_years.sort('average'),
    x='country_abbr',
    y='average',
    custom_data = ['Country', 'average']
    )

fig.update_layout(
    title = '<br>Years Professional Coding Experience<br><sup>Countries with at least 1000 survey responses</sup>',
    height=400, width=800,
    xaxis_title='Country',
    yaxis_title='Years (Average)',
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)

#  Setup hover elements
fig.update_traces(
    mode='markers+lines',
    hovertemplate = '<br>'.join([
    '%{customdata[0]}',
    '%{customdata[1]:.2f} Years',
    '<extra></extra>'
    ])
)
#  add scatter plot to emphasize specific countries.
df_focus = df_coding_years.filter(pl.col('country_abbr').is_in(['IND','CAN','USA','AUS']))
fig = fig.add_traces(
    px.scatter(
        df_focus,
        x='country_abbr', 
        y='average',
        text=df_focus['my_text']
        ).data
)
fig.update_traces(textposition='bottom right')

fig.update_traces(line_color='lightgray', line_width=2, marker_size=10)
 # extend x-axis to include all points and annotations
fig.update_xaxes(range=[-1, len(countries_1k)+2.5]) 
fig.update_xaxes(showgrid=True) 
fig.update_yaxes(showgrid=False)

#------------------------------------------------------------------------------#
#      Annotate                                                                #
#------------------------------------------------------------------------------#
annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer<br>"
annotation += 'Survey. Average includes values between<br>'
annotation += 'quantiles 25 & 75. Arbitrary emphasis on<br>'
annotation += 'min & max averages, and North American countries. '
fig = add_annotation(fig, annotation, 0.05, 0.95, 'left', 'left', 'top')
fig.show()

#------------------------------------------------------------------------------#
#     Scatter plot, x is years experience, y is salary
#------------------------------------------------------------------------------#
df_salary = (
    df_csv
    .select(pl.col('ConvertedCompYearly','WorkExp' ))
    .filter(pl.col('ConvertedCompYearly') != 'NA')
    .filter(pl.col('WorkExp') != 'NA')
    .with_columns(pl.col('WorkExp').cast(pl.Int32))
    .with_columns(pl.col('ConvertedCompYearly').cast(pl.Float32))
    .filter(
        pl.col('ConvertedCompYearly') < 200000,
        pl.col('WorkExp') < 21
    )
    .select(pl.col('ConvertedCompYearly','WorkExp' ))
    .with_columns(MEDIAN = pl.col('ConvertedCompYearly').median().over('WorkExp'))
    .with_columns(Q25 = pl.col('ConvertedCompYearly').quantile(0.25).over('WorkExp'))
    .with_columns(Q75 = pl.col('ConvertedCompYearly').quantile(0.75).over('WorkExp'))
    .unique('WorkExp')
    .sort('WorkExp')
)

print('\n\n\n')
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df_salary['WorkExp'],
        y=df_salary['Q25'],
        fill='none'
    )
)
fig.add_trace(
    go.Scatter(
        x=df_salary['WorkExp'], 
        y=df_salary['Q75'], 
        fill='tonexty', 
        fillcolor='LightCoral'
    )
)
fig.add_trace(
    go.Scatter(
        x=df_salary['WorkExp'], 
        y=df_salary['MEDIAN'], 
        fill='tonexty',
        fillcolor='MediumSpringGreen'
    )
)

fig.update_layout(
    title = '<br><b>All Countries:</b> Salary by Years of Experience',
    height=600, width=800,
    xaxis_title='Years of Experience',
    yaxis_title='Salary ($)',
    yaxis_title_font=dict(size=14),
    xaxis_title_font=dict(size=14),
    margin={"r":50, "t":50, "l":50, "b":50},
    autosize=False,
    showlegend=False,
    template='plotly_white',
)
# make median trace thicker (3) than Q25 and Q75 traces (1)
for i, trace in enumerate(fig.data):
    trace.line.width = 3 if i == 2 else 1
    trace.line.color = 'black' if i == 2 else 'lightgray'
 
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

#------------------------------------------------------------------------------#
#      Annotations                                                                #
#------------------------------------------------------------------------------#
annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer Survey.<br>"
fig = add_annotation(fig, annotation, 0.05, 1.0, 'left', 'left', 'top')

annotation = "<b>Median<br>"
fig = add_annotation(
    fig, 
    annotation, 
    df_salary['WorkExp'].max(),
    df_salary['MEDIAN'][-1],
    'left', 
    'left', 
    'middle',
    xref = 'x', 
    yref = 'y', 
    )

annotation = "<b>75th Percentile<br>"
fig = add_annotation(
    fig, 
    annotation, 
    df_salary['WorkExp'].max(),
    df_salary['Q75'][-1],
    'left', 
    'left', 
    'middle',
    xref = 'x', 
    yref = 'y', 
    )

annotation = "<b>25th Percentile<br>"
fig = add_annotation(
    fig, 
    annotation, 
    df_salary['WorkExp'].max(),
    df_salary['Q25'][-1],
    'left', 
    'left', 
    'middle',
    xref = 'x', 
    yref = 'y', 
    )

fig.show()

#------------------------------------------------------------------------------#
#     Scatter plot, by country with 1000 or more response
#------------------------------------------------------------------------------#
for country in countries_1k:
    df_salary = (
        df_csv
        .with_columns(pl.col('Country').str.replace('United States of America', 'United States'))
        .with_columns(pl.col('Country').str.replace('United Kingdom of Great Britain', 'United Kingdom'))
        .with_columns(pl.col('Country').str.replace('United Kingdom and Northern Ireland', 'United Kingdom'))
        .filter(pl.col('Country') == country)
        .select(pl.col('ConvertedCompYearly','WorkExp' ))
        .filter(pl.col('ConvertedCompYearly') != 'NA')
        .filter(pl.col('WorkExp') != 'NA')
        .with_columns(pl.col('WorkExp').cast(pl.Int32))
        .with_columns(pl.col('ConvertedCompYearly').cast(pl.Float32))
        .filter(
            pl.col('ConvertedCompYearly') < 200000,
            pl.col('WorkExp') < 21
        )
        .select(pl.col('ConvertedCompYearly','WorkExp' ))
        .with_columns(MEDIAN = pl.col('ConvertedCompYearly').median().over('WorkExp'))
        .with_columns(Q25 = pl.col('ConvertedCompYearly').quantile(0.25).over('WorkExp'))
        .with_columns(Q75 = pl.col('ConvertedCompYearly').quantile(0.75).over('WorkExp'))
        .unique('WorkExp')
        .sort('WorkExp')
    )

    print('\n\n\n')
    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=df_salary['WorkExp'],
            y=df_salary['Q25'],
            fill='none'
        )
    )
    fig.add_trace(
        go.Scatter(
            x=df_salary['WorkExp'], 
            y=df_salary['Q75'], 
            fill='tonexty', 
            fillcolor='LightCoral'
        )
    )
    fig.add_trace(
        go.Scatter(
            x=df_salary['WorkExp'], 
            y=df_salary['MEDIAN'], 
            fill='tonexty',
            fillcolor='MediumSpringGreen'
        )
    )

    fig.update_layout(
        title = f'<br><b>{country}:</b> Salary by Years of Experience',
        height=600, width=800,
        xaxis_title='Years of Experience',
        yaxis_title='Salary ($)',
        yaxis_title_font=dict(size=14),
        xaxis_title_font=dict(size=14),
        margin={"r":50, "t":50, "l":50, "b":50},
        autosize=False,
        showlegend=False,
        template='plotly_white',
    )
    # make median trace thicker (3) than Q25 and Q75 traces (1)
    for i, trace in enumerate(fig.data):
        trace.line.width = 3 if i == 2 else 1
        trace.line.color = 'black' if i == 2 else 'lightgray'
    
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(showgrid=False)

    #------------------------------------------------------------------------------#
    #      Annotations                                                                #
    #------------------------------------------------------------------------------#
    annotation = "<b>Data Source:</b> 2023 Stack Overflow Annual Developer Survey.<br>"
    fig = add_annotation(fig, annotation, 0.05, 1.0, 'left', 'left', 'top')

    if True:
        annotation = "<b>Median<br>"
        fig = add_annotation(
            fig, 
            annotation, 
            df_salary['WorkExp'].max(),
            df_salary['MEDIAN'][-1],
            'left', 
            'left', 
            'middle',
            xref = 'x', 
            yref = 'y', 
            )

        annotation = "<b>75th Percentile<br>"
        fig = add_annotation(
            fig, 
            annotation, 
            df_salary['WorkExp'].max(),
            df_salary['Q75'][-1],
            'left', 
            'left', 
            'middle',
            xref = 'x', 
            yref = 'y', 
            )

        annotation = "<b>25th Percentile<br>"
        fig = add_annotation(
            fig, 
            annotation, 
            df_salary['WorkExp'].max(),
            df_salary['Q25'][-1],
            'left', 
            'left', 
            'middle',
            xref = 'x', 
            yref = 'y', 
            )

    fig.show()
3 Likes

Looks like the biggest jump in salary happens between the 2 year mark and the 7 year mark.
After that, median salaries tend to go up with more years of experience but not as steeply.

2 Likes

Smart analysis @Mike_Purtell ,
The AI favorability confirms that younger generations are more bias in favor of AI.
About the coding experience we can see higher values in developed countries contrasting with India that is an emerging power in tech.

3 Likes

UPDATE: Added github code
Hi data fellas,

For this week’s dataset I focused on compare the votes for every technology option(Language, Database, Framework.etc) on have work with and want to work with categories on a dual axis bar chart and visualize the contrast in respondents count accross the world.


Live App
Github Code

3 Likes

I agree @adamschroeder . The jump between the 2-year and 7-year marks seems to match conventional wisdom. I just updated the script to show plots by individual countries, may be more useful. Thank you.

1 Like

Thank you @Alfredo49, I agree with your comments. India surely is an emerging power in tech, and will continue to grow for many, many years.

1 Like

Nice dashboard @Alfredo49. I really enjoyed exploring it.

1 Like

What a great app, @Alfredo49 . I loved playing around with it.

It was interested to see how for most age groups, JavaScript was the most worked-with language but when you turn on the under 18yo filter, it’s python. JS is in third place for the youngest generation.

3 Likes

Thanks @Mike_Purtell & @adamschroeder,

Good insight Adam, I think this can be explained considering Python community is growing faster than Javascript’s so maybe we will see Python overthrowing Javascript in the coming years.

3 Likes

Hello, hello!

Apologies for the delay! Great choice of data set, @adamschroeder . There was so much to explore that it was hard for me to focus just on improving the visualization.

Improvements to the Original Visualization

The goal was to analyze the relationship between work experience and annual compensation. The heatmap in general is a great choice for showing patterns between two data categories (here: bins of numeric columns). However, deriving meaningful insights from a heatmap can sometimes be tricky. Aggregated results can complement the heatmap well.

To make insights clearer, I’ve added a simple bar chart that shows the median annual salary for different bins of work experience. This one-dimensional view makes it easier to understand statements like, “People with 3-4 years of experience, on average, earn XXX amount.” The downside is that outliers, which the heatmap reveals, are less visible. Thus, both charts together offer a more comprehensive analysis.

Additional Improvements

  • Filters for Country and Developer Type: I’ve added these to ensure more accurate comparisons. Without these filters, the heatmap can be misleading due to varying salary levels across countries (e.g., USA vs. India) and roles (e.g., senior executive vs. student).
  • Sequential Color Palette: I’ve chosen a more intuitive palette for the heatmap, where lighter colors represent smaller values and darker colors larger values. This is easier to understand than the yellow-purple palette, which requires checking which color corresponds to high or low values.

figure-friday-31

5 Likes

@Mike_Purtell - Really great analysis! :rocket:

I love how you’ve decluttered the chart and kept the x-axis labels short and easy to read. I often work with dashboards full of bar charts, and using round bars adds a refreshing variety! Your analytical strength really shines in the second chart. I noticed the same issue with insufficient data points for some categories, so I appreciate you including only countries with more than 1,000 observations!

1 Like

Hey @Alfredo49, interesting chart choice! :star_struck:

I think your analysis could be even more effective with a butterfly chart. Are you familiar with this type of chart? You’re already very close to creating one—just switch the sides of the charts. This way, it will be easier to see, for example, whether more people want to work with Python than those who already do, as each technology will be aligned in the same row. Keep up the great work!

Example of a butterfly chart:
Screenshot 2024-08-09 at 12.21.58

3 Likes

This is great advice. Make it easy to compare categories. I don’t recall the last time I built a butterfly chart with the px.bar(). Do you mind sharing the code for that chart?
Thank you @li.nguyen .

1 Like

Here we go - I used Vizro chart template for the chart above. So the code below will return the plotly template instead :slight_smile:

import pandas as pd
import plotly.graph_objects as go

ages = pd.DataFrame(
    {
        "Age": ["0-19", "20-29", "30-39", "40-49", "50-59", ">=60"],
        "Male": [800, 2000, 4200, 5000, 2100, 800],
        "Female": [1000, 3000, 3500, 3800, 3600, 700],
    }
)


def butterfly(data_frame: pd.DataFrame, x1: str, x2: str, y: str):
    fig = go.Figure()
    fig.add_trace(
        go.Bar(
            x=-data_frame[x1],
            y=data_frame[y],
            orientation="h",
            name=x1,
        )
    )
    fig.add_trace(
        go.Bar(
            x=data_frame[x2],
            y=data_frame[y],
            orientation="h",
            name=x2,
        )
    )
    fig.update_layout(barmode="relative")
    return fig

butterfly(ages, x1="Male", x2="Female", y="Age")
4 Likes

That’s a smart way to drill down into the data and get a better picture of the data. With just a couple of clicks of a button you get so much valuable information. I kinda wish I could play around with this live :stuck_out_tongue_winking_eye:

@li.nguyen @Mike_Purtell @Alfredo49 , I plan to post about your beautiful graphs and apps on Plotly’s LinkedIn after the Figure Friday session today. Thank you for creating them and sharing with us.

3 Likes

Thank you @li.nguyen for your kind words and always insightful comments. On to week 32.

1 Like