Figure Friday 2024 - week 51

join the Figure Friday session on December 27, at noon Eastern Time, to showcase your creation and receive feedback from the community. Session cancelled due to holiday break.

Did you know that animal caretaker reported standing on average 5.4 hours per day, while compliance officers reported standing 1.1 hours?

In this week’s Figure-Friday we’ll look at the dataset from the Occupational Requirements Survey, provided by the U.S. Bureau of Labor Statistics. The dataset is rich with information on physical activity, posture, environment of work and much more, all categorized by occupation.

Several columns belonging to the dataset were removed to manage data size. In case you’re interested in the full dataset and the meta data, download the preliminary third wave - reference year 2024 complete dataset excel sheet located in the U.S. Bureau of Labor Statistics page.

Things to consider:

  • can you improve the sample figure below (bar plot)?
  • would a different figure tell the data story better?
  • can you create a Dash app instead?

Sample figure:

Code for sample figure:
import plotly.express as px
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-51/ors-limited-dataset.csv')
df["ESTIMATE"] = pd.to_numeric(df["ESTIMATE"], errors='coerce')

df = df[(df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to sit, mean') | (df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to stand, mean')]
# Drop these two occupations because of a glitch in reporting
df = df[(df['OCCUPATION'] != 'Firefighters') & (df['OCCUPATION'] != 'First-line supervisors of fire fighting and prevention workers')]

# Add a temporary column for sorting priority
df['estimate_priority'] = df['ESTIMATE TEXT'].map({'Hours of the day that workers were required to sit, mean': 0, 'Hours of the day that workers were required to stand, mean': 1})
sorted_df = df.sort_values(by=['estimate_priority', 'ESTIMATE'], ascending=[True, False])
# Drop the temporary column
sorted_df = sorted_df.drop(columns=['estimate_priority'])

fig = px.bar(sorted_df, x='OCCUPATION', y='ESTIMATE', color='ESTIMATE TEXT')
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to the U.S. Bureau of Labor Statistics for the dataset (preliminary third wave - reference year 2024 complete dataset).

4 Likes

My input for this figure friday 51, yes in 15 minutes I’m going to create a shopping list etc. and start doing nothing. I took the data as they were, things can be said about that. I wanted to create a scatterplot, research how to make the most of the tooltip and control colours in the scatterplot. There are a lot of comments in the code. I used dangerouslyinnerhtml because I quickly wanted to create an html list for the cards. If you outcomment the two cards and the “from” on top, I think the rest will work (hope).
I did not do the find a meaningful icon thing for the table. Enjoy your Christmas! :christmas_tree:

import plotly.graph_objects as go
import pandas as pd
import numpy as np
from dash import Dash, html, dcc, Input, Output, callback
from dash_dangerously_set_inner_html import DangerouslySetInnerHTML
import dash_bootstrap_components as dbc


df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-51/ors-limited-dataset.csv')
df["ESTIMATE"] = pd.to_numeric(df["ESTIMATE"], errors='coerce')



df = df[(df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to sit, mean') | (df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to stand, mean')]
# Drop these two occupations because of a glitch in reporting, dropped all workers too, it's a special case.
df = df[(df['OCCUPATION'] != 'All workers') & (df['OCCUPATION'] != 'Firefighters') & (df['OCCUPATION'] != 'First-line supervisors of fire fighting and prevention workers')]

# # Add a temporary column for sorting priority
# df['estimate_priority'] = df['ESTIMATE TEXT'].map({'Hours of the day that workers were required to sit, mean': 0, 'Hours of the day that workers were required to stand, mean': 1})
# sorted_df = df.sort_values(by=['estimate_priority', 'ESTIMATE'], ascending=[True, False])
# # Drop the temporary column
# sorted_df = sorted_df.drop(columns=['estimate_priority','estima'])

##input df = from adam's sourcecode


df = df.drop(columns = ['ESTIMATE TEXT', 'DATATYPE'])


### Let's assume a stand/sit ratio of 1 is really the healthiest option








#some occupations do not have both stand and sit estimates

vc = df['OCCUPATION'].value_counts()
occupations = vc[vc==2].index.tolist()


#add occupations as first column
dfp = pd.DataFrame({'Occupation': occupations})
dfp["Ratio"] = float(0)


#decide on the color of the dot. Boundaries are doing more than 5 hours something and doing more than 6 hours
#something. The values below are comparing values for ratio and ratio is division stand/sit.

red_up = float(6/2)
red_low = float(2/6)
orange_up = float(5/3)
orange_low = float(3/5)

for i, r in dfp.iterrows():
    occupation = r['Occupation']
    sitting = df.query('OCCUPATION == @occupation & CATEGORY == "Sitting"', inplace=False)['ESTIMATE'].values[0]
    standing = df.query('OCCUPATION == @occupation & CATEGORY == "Standing"', inplace=False)['ESTIMATE'].values[0]
    ratio = round(standing/sitting,2)
    dfp.at[i,'Sitting'] = sitting
    dfp.at[i,'Standing'] = standing
    dfp.at[i,'Ratio'] = ratio
   #add colorcolumn for styling dots
    dfp['colors'] = dfp['Ratio'].apply(lambda x: 'red' if x > red_up or x < red_low else ('orange' if x > orange_up or x < orange_low else 'green'))

    
    
    

fig=go.Figure()

#apparently, if you want extra data in the hover, you have to create lists
#occupations is already there, but check if nothing changed. This is to add occupations
#but if you want to add ratio two you have to create a 2dim array
#I used ratio to see if the tooltip made sense, removed it because it does not make
#sense to people who have no idea.

mycustomdata = np.stack((dfp['Occupation'], dfp['Ratio']), axis=-1)



#painting the dots

fig.add_trace(go.Scatter(x=dfp['Sitting'],
                         y= dfp['Standing'],
                         mode = 'markers',
                         marker=dict(color=dfp.colors),

                         ))


#creating the custom tooltip



fig.update_traces(customdata=mycustomdata,
                   #hovertemplate = "<b>%{customdata[0]} - ratio %{customdata[1]}<br>"+\
                       hovertemplate = "<b>%{customdata[0]} <br>"+\
                   "Avg. sitting: %{x} hrs<br>"+\
                   "Avg. standing: %{y} hrs");

#add line perfect balance


fig.update_layout(shapes=[
  dict(
    type= 'line',
    yref= 'y', y0= 0, y1= 8,
    xref= 'x', x0= 0, x1= 8,
    name="lowerboundary",
    line=dict(
                  color="#70d158",
                  width=1,
                  dash="dot",
              )
  ),
])

#remove those horrible margins

fig.update_layout(
    margin=dict(l=5, r=5, t=5, b=5),
)

#add some annotations instead of text along the axis
fig.add_annotation(x=5, y=7,
            text="Standing more", showarrow=False)
fig.add_annotation(x=7, y=3,
            text="Sitting more", showarrow=False)
fig.add_annotation(x=7.5, y=7.5,
            text="Perfect balance", showarrow=False,  font=dict(
                color="green",
            ),
)


#healthy card shows occupations nearest to ratio 1
def healthy_card(dfp):
    #compare with one as the optimal value for ratio, find nearest 6 occupations
    #thanks to stack overflow
    df_output = dfp.iloc[(dfp['Ratio']-1).abs().argsort()[:6]]
    outputlist = '<ul>'
    for i, r in df_output.iterrows():
        outputlist += '<li>' + r['Occupation'] + '</li>'
    outputlist += '</ul>'
    
    
    healthycard =  dbc.Card(
        [
  
            dbc.CardBody(
                [
                    html.H4("Looking for an occupation with a nice sit/stand balance?", className="card-title"),
                    html.H5('Try:'),
                    
                    #I KNOW, I JUST WANTED A PLAIN DECENT HTML LIST IN THE FASTEST WAY
                    html.Div(DangerouslySetInnerHTML(outputlist))
                ]
            ),
        ],  style={"marginBottom": "2rem"}
        
    )
    return healthycard


def not_healthy_card(dfp):
    
    #pick head(3) and tail(3) of dataframe ordered on ratio.
    #head and tail have the most "red" values. It could be that
    #it could be head(4), tail(2) are better, or another combi,
    #because the head values are more extreme (or tail)
    
    #sort ratio
    df_sort = dfp.sort_values(by=['Ratio'])
    
    #combine head and tail to make creating a list easier.
    df_output =  pd.concat([df_sort.head(3), df_sort.tail(3)], ignore_index=True)
 
    #loop and create an html list, I have not found a list component I like out of
    #the box.
    
    outputlist = '<ul>'
    for i, r in df_output.iterrows():
        outputlist += '<li>' + r['Occupation'] + '</li>'
    outputlist += '</ul>'
    
    
    not_healthycard =  dbc.Card(
        [
  
            dbc.CardBody(
                [
                    html.H4("Take very good care of your health if this is your occupation:", className="card-title"),
  
                    
                    #I KNOW, I JUST WANTED A PLAIN DECENT HTML LIST IN THE FASTEST WAY
                    html.Div(DangerouslySetInnerHTML(outputlist))
                ]
            ),
        ]
        
    )
    return not_healthycard

def how_about_me(dfp, selected_job):
    #queries the df for the selected job, returns one row

    q= dfp.query('Occupation == @selected_job', inplace=False)
    #create table output, this should be a css grid with divs, but this is faster.
    row1 = html.Tr([html.Td('Imagine icon'), \
        html.Td('Avg. sitting ' + str(q['Sitting'].values[0]) + ' hrs a workingday'), \
                    html.Td('Avg. standing ' + str(q['Standing'].values[0]) + ' hrs a workingday')])

    stylestring= {
        "borderColor": q['colors'],
		"fontSize": '18px'
        }
    
    #finally understand how you can dynamically create a stylestring and insert it
    #this one sets the bordercolor of the table and enlarges the font for no reason :-)
    this_about_you  = dbc.Table([html.Tbody([row1])], bordered=True, style=stylestring)
    
    return this_about_you 
    



dbc_css = "https://cdn.jsdelivr.net/gh/AnnMarieW/dash-bootstrap-templates/dbc.min.css"
app = Dash(__name__, external_stylesheets=[dbc.themes.SANDSTONE, dbc_css])



app.layout = dbc.Container(
    [   dbc.Row([
        dbc.Col([
            html.H2('How healthy is your Occupation?'),
            html.P('Some people say the perfect occupation has a workingday divided in 50% standing and 50% \
                   sitting.'),
            html.P('In the graph below every dot represents the sit/stand balance for an occupation. The green dashed\
                   line represents balance perfection. '),

            dcc.Graph(id="scatter-plot", figure = fig),
            html.H2('How about me?'),
            #ok, this should not have a default value, it should be empty and no tablerow displayed at first.
            html.Div([
                #dropdown from occupationslist
                  dcc.Dropdown(
                                id='search_job',
                                options=occupations,
                                searchable = True,
                                value='Software developers',
                                placeholder="Find occupation..."
                ),
           



                       ],   style={'marginBottom':'2rem', 'MarginTop':'2rem'}),
            html.Div(id='jobdata')
            
            ], className = 'col-md-8'),
        
        
        dbc.Col([
            #if you comment out these two card functions, this code will work
            #without dangerous html, I hope.
           healthy_card(dfp),
           not_healthy_card(dfp)
        ], className = 'col-md-4')
          
        ])
], style={"marginTop": "2rem"})


@app.callback( Output('jobdata', 'children'),
          Input(component_id='search_job', component_property='value'))


def update_job_data(job):
    
    
    jobtable = how_about_me(dfp, job)
       
    
    return jobtable




app.run_server(debug=True)



7 Likes

Updated! Hi, I’ll try something simpler, I’ll send you the code later. I thought it would be more readable if I only showed the top 10. But the dropdown allows you to select all occupations.

from dash import Dash, dcc, html 
import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template
import plotly.express as px
import pandas as pd
from dash.dependencies import Input, Output

# Initialize Dash app with Bootstrap dark theme
dbc_css = "https://cdn.jsdelivr.net/gh/AnnMarieW/dash-bootstrap-templates/dbc.min.css"

# Initialize the Dash app
app = Dash(__name__, external_stylesheets=[
    dbc.themes.CYBORG,
    dbc_css,
    "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css"  # Font Awesome for icons
])

# Load the template consistent styling
load_figure_template("SLATE")

# Load and preprocess data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-51/ors-limited-dataset.csv')

# Filter for sitting and standing jobs
df = df[
    (df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to sit, mean') | 
    (df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to stand, mean')
]

# Remove occupations with glitches
df = df[
    (df['OCCUPATION'] != 'Firefighters') & 
    (df['OCCUPATION'] != 'First-line supervisors of fire fighting and prevention workers')
]

# Convert 'ESTIMATE' column to numeric
df["ESTIMATE"] = pd.to_numeric(df["ESTIMATE"], errors='coerce')

# Separate data for sitting and standing
df_standing = df[df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to stand, mean']
df_sitting = df[df['ESTIMATE TEXT'] == 'Hours of the day that workers were required to sit, mean']

# Helper function to create bar charts
def create_bar_chart(data, title, color_scale, cmin, cmax, icon_html):
    fig = px.bar(
        data,
        x='ESTIMATE',
        y='OCCUPATION',
        orientation='h',
        labels={'ESTIMATE': 'Average Hours', 'OCCUPATION': 'Occupation'},
        color='ESTIMATE',
        color_continuous_scale=color_scale
    )

    # Remove the color scale from the legend
    fig.update_layout(
        title={
            'text': f"{icon_html} {title}",
            'y': 0.95,
            'x': 0.5,
            'xanchor': 'center',
            'yanchor': 'top',
            'font': {'size': 18}  # Font size for responsive titles
        },
        height=700,
        margin=dict(l=50, r=50, t=100, b=50),
        xaxis_tickfont=dict(size=17),
        yaxis_tickfont=dict(size=17),
        coloraxis_showscale=False,  # This removes the color scale from the legend
        showlegend=False  # Disable the legend
    )
    return fig

# Create the layout for the app
app.layout = dbc.Container([
    html.H1("Top Jobs for Sitting and Standing", style={'textAlign': 'center', 'color': 'white'}),
    html.Br(),

    # Dropdown to select occupation (Aligned to the left)
    dbc.Row([
        dbc.Col([ 
            html.H6("Select Occupation /not only form the TOP 10/", style={'color': 'white'}),
            dcc.Dropdown(
                id='occupation-dropdown',
                options=[{'label': occupation, 'value': occupation} for occupation in df['OCCUPATION'].unique()],
                multi=True,  # Allow multi-selection
                style={'width': '70%'}
            ),
        ], width={"size": 6}, style={'textAlign': 'left'}),  # Align to the left
    ], justify="start"),
    html.Br(),  # Align the dropdown to the left

    # KPI cards placed beside the dropdown
    dbc.Row([
        dbc.Col([ 
            dbc.Card([ 
                dbc.CardBody([ 
                    html.H3("Standing Hours", className="card-title", style={'textAlign': 'center'}),
                    html.H5(id='kpi-standing-hours', className="text-center", style={'color': 'white'})
                ])
            ], className="text-center")
        ], xs=12, sm=6, md=4, lg=3, xl=2),
        dbc.Col([ 
            dbc.Card([ 
                dbc.CardBody([ 
                    html.H3("Sitting Hours", className="card-title", style={'textAlign': 'center'}),
                    html.H5(id='kpi-sitting-hours', className="text-center", style={'color': 'white'})
                ])
            ], className="text-center")
        ], xs=12, sm=6, md=4, lg=3, xl=2)
    ], justify="center"),
    html.Br(),

    # Graph rows for sitting and standing (order swapped)
    dbc.Row([
        dbc.Col([ 
            dcc.Loading(
                type="circle", 
                children=[
                    dcc.Graph(id='graph-sitting', style={'height': '60vh'})  # Graph for sitting jobs
                ]
            )
        ], xs=12, sm=12, md=6, lg=6, xl=6),  # Responsive widths for graphs

        dbc.Col([ 
            dcc.Loading(
                type="circle", 
                children=[
                    dcc.Graph(id='graph-standing', style={'height': '60vh'})  # Graph for standing jobs
                ]
            )
        ], xs=12, sm=12, md=6, lg=6, xl=6),  # Responsive widths for graphs
    ], justify="center", align="center"),
], fluid=True)

# Define callback to update graphs and KPI based on dropdown selection
@app.callback(
    [Output('graph-standing', 'figure'),
     Output('graph-sitting', 'figure'),
     Output('kpi-sitting-hours', 'children'),
     Output('kpi-standing-hours', 'children')],
    [Input('occupation-dropdown', 'value')]
)
def update_graphs(selected_occupations):
    # Filter data based on selected occupations, default to top 10
    if selected_occupations:
        filtered_standing = df_standing[df_standing['OCCUPATION'].isin(selected_occupations)]
        filtered_sitting = df_sitting[df_sitting['OCCUPATION'].isin(selected_occupations)]
    else:
        # Get top 10 occupations for both sitting and standing, sorted in ascending order by ESTIMATE
        filtered_standing = df_standing.nsmallest(10, 'ESTIMATE')
        filtered_sitting = df_sitting.nsmallest(10, 'ESTIMATE')

    # Create the graph for standing jobs with reversed title (Top 10 Sitting Jobs)
    graph_figure_standing = create_bar_chart(
        filtered_standing, 
        "Top 10 Sitting Jobs",  # Reversed title
        ['pink', 'magenta'], 
        cmin=0, 
        cmax=8, 
        icon_html="🧍" 
    )

    # Create the graph for sitting jobs with reversed title (Top 10 Standing Jobs)
    graph_figure_sitting = create_bar_chart(
        filtered_sitting, 
        "Top 10 Standing Jobs",  # Reversed title
        ['lightblue', 'blue'], 
        cmin=0, 
        cmax=9, 
        icon_html="🪑" 
    )

    # Calculate total standing hours and sitting hours
    total_standing_hours = filtered_standing['ESTIMATE'].sum()
    total_sitting_hours = filtered_sitting['ESTIMATE'].sum()

    # Update KPI for hours
    return graph_figure_standing, graph_figure_sitting, f"{total_sitting_hours:.2f} Hours", f"{total_standing_hours:.2f} Hours"

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)
4 Likes

What I like about your lists is that it makes obvious I removed all occupations which only require standing or sitting :slight_smile: . The content of the take good care card is wrong. Curious to see what happens when you select an hours range. Have a nice evening!

2 Likes

Yes, thank you, I’m still working on it, I paid more attention to the technical part than the content. :slight_smile:

1 Like

Hello Everyone,

I selected a stack bar chart to show how is the telework by occupation.

I save an interactive html, but here is not allowed this format

5 Likes

:clap: Fantastic all the charts! I haven’t get round to even see the data! This week and the next one it’s going to be complicated, time-related talking, for me. So, I’ll hope to produce at least some respectable chart!

3 Likes

What an interesting way of looking at the data, @marieanne . Thank you for sharing. Great app.
I also loved how you offer that extra info in the cards to the right.

A few things that would help me personally in the app:

  1. seeing a x and y axis title, such as mean standing hours and mean sitting hours.
  2. seeing a legend would quickly offer clarity on the ranges of each color and their significance
  3. seeing a larger scatter marker when the respective occupation is chosen in the dropdown
  4. what is the Imagine icon part?
2 Likes

Thanks for sharing, @Ester . We’d love to see the code when it’s ready.

I like how you highlight the top 10 in each category. But since a great majority of the top 10 represent a range of 7-8 hours, I’m not sure the legend is very helpful in this case.

Will the dropdown allow the user to select occupations outside the top 10?

2 Likes

How interesting to explore the telework variable of the dataset.
@Avacsiglo21 can you share a little more info about this variable please?

The Y axis represents the percentage of workers that could potentially work remote or that have reported working remote?

1 Like

Hey Everyone @here,
For inspiration, here’s a cool pudding data story that builds on this dataset.

4 Likes

Ok, I will send the code in 1-2 days. Thanks for help.

2 Likes

@adamschroeder thank you for your valuable feedback. Per point:

  1. axis titles, I actually tried to replace them by the in plot words but that’s not clear enough. I was at the level of gosh, annotation, that easy.
  2. legend, true, the explanation is now in my head, not on screen.
  3. this morning I was thinking about a popping up tooltip if you select an occupation, probably same line of thinking. Than I was vacuum cleaning.
  4. Imagine (meaningful ) icon = being lazy not inserting 3 coloured icons (red, orange, green) for comprehension. A coloured border is not sharp enough.

When I find time… thank you, Marie-Anne

2 Likes

Hi Adams,
The Y axis represents to % Percent of workers that had or not have of telework. Based on that. Interesting to explore how this percentage was reflected in occupations and that it is a reality that remote work has a strong impact on the current reality.

1 Like

May be interesting make look for information is any, how have been overall telework trend from the pandemic.

1 Like

After processing the feedback, some things maybe slightly different here is the result, with the code:

I wish you all a very enjoyable :christmas_tree:!

4 Likes

Amazing, @marieanne . Nice upgrades to the app. I’m sharing an image here for those that might not click the app link:

2 Likes

hi everyone @here,
The Figure-Friday session is cancelled tomorrow because I’m out of the office.

The post for the last figure-friday week of 2024 will go out tomorrow morning, and the Figure-Friday session on January 3 will happen as usual.

Happy Holidays,

2 Likes

Here is my interactive visualisation:

Happy Christmas! :christmas_tree::fireworks::firecracker:

3 Likes

Happy holidays, @Ester and a happy New Year :partying_face:
Thank you for creating the app and sharing the code.

1 Like