Figure Friday 2024 - week 30

Update : Figure Friday 2024 - week 31 is the newer dataset.

It’s week 30 of the Figure Friday initiative and with it we’ll be exploring the following data set:

The Rural Development Agency is part of the Department of Agriculture and it provides loans, grants, and loan guarantees to bring prosperity and opportunity to rural areas.

The provided data set was pulled for 2024. However, If you prefer to analyze other years, feel free to filter the data directly in the Data Download tab.

Sample Figure:

Code for sample figure:
import pandas as pd
import plotly.express as px

# Load the CSV file
file_path = 'https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-30/rural-investments.csv'
data = pd.read_csv(file_path)

# Select only the relevant columns for the plot
investment_data = data[['County FIPS', 'Investment Dollars']]

# Convert County FIPS to string, remove leading apostrophes, and add zero when needed
investment_data['County FIPS'] = investment_data['County FIPS'].astype(str).str.replace("'", "").str.zfill(5)

# Convert Investment Dollars to float
investment_data['Investment Dollars'] = investment_data['Investment Dollars'].str.replace(',', '').astype(float)

# Group by County FIPS and sum the investment dollars
investment_data_grouped = investment_data.groupby('County FIPS').sum().reset_index()

# Create the map
fig = px.choropleth(investment_data_grouped,
                    geojson="https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json",
                    locations='County FIPS',
                    color='Investment Dollars',
                    color_continuous_scale="Viridis",
                    scope="usa",
                    labels={'Investment Dollars': 'Investment ($)'},
                    title='Investment Dollars per County')

fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})

fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to the Rural Development Agency for the data.

July 30, 2004 update: Changed Pareto by state to only include US states, added ‘Others’ as a region to group the smallest entries, with color value of gray. Here is the code:

'''
This script produces:
    - choropleth map of Rural investment by US county
    - Pareto showing investment amounts grouped by state or territory
    - Pareto with investment amounts grouped by region.

hover info:
    colorpleth map: county name, state abbreviation, investment amount
    pareto by state: state name, investment amount, cumulative %
    pareto by region: region name, investment amount, cumulative %

pareto by state filtered to only include 50 US states
pareto by region merged smallest invested regions to group called others
    
'''
import polars as pl   # dataframe library
import us             # library with USA state info(name, abbr, timezone, etc.)
import plotly.express as px
from plotly.subplots import make_subplots

#------------------------------------------------------------------------------#
#     Functions                                                                #
#------------------------------------------------------------------------------#
def add_annotation(fig, annotation, x, y, align, xanchor, yanchor):
    fig.add_annotation(
        text=annotation,
        showarrow=False,
        xref = 'paper', x=x, yref = 'paper', y=y,
        align= align, xanchor=xanchor, yanchor=yanchor,
        font =  {'size': 14, 'color': 'darkslategray'}
    )
    return fig

def update_layout(
        fig, my_title, my_height, my_width,              # mandatory parameters
        my_xtitle='', my_ytitle='', my_legend_title='',  # optional parameters
        my_showlegend=False
    ):
    fig.update_layout(
        title = my_title,
        xaxis_title=my_xtitle,
        yaxis_title=my_ytitle,
        legend_title=my_legend_title,
        height=my_height, width=my_width,
        margin={"r":50, "t":50, "l":50, "b":50},
        autosize=False,
        showlegend=my_showlegend
    )
    return fig
    
#------------------------------------------------------------------------------#
#     Map time zones to states & territoriess, assign region names             #
#------------------------------------------------------------------------------# 
region_dict = {}  
for key in us.states.mapping('name', 'time_zones'):
    value = (
        us.states.mapping('name', 'time_zones')[key][0]
        .replace('America/New_York',        'East')
        .replace('America/Chicago',         'Central')
        .replace('America/Denver',          'Mountain') 
        .replace('America/Los_Angeles',     'Pacific')        
        .replace('America/Anchorage',       'Alaska')
        .replace('Pacific/Honolulu',        'Hawaii') 
        .replace('America/Phoenix',         'Pacific')
        .replace('America/Boise',           'Central')  #for N. Dak, not Idaho
        .replace('America/Puerto_Rico',     'Puerto Rico')
    )
    region_dict[key] = value

#------------------------------------------------------------------------------#
#     Load csv file, process rows and columns                                  #
#------------------------------------------------------------------------------#
file_path = (
    'https://raw.githubusercontent.com/plotly/Figure-Friday/' +
    'main/2024/week-30/rural-investments.csv'
)

investment_data = (
    pl.read_csv(file_path, ignore_errors = True)
    .rename({'State Name': 'state_name'})
    .with_columns(
        pl.col('County FIPS').str.replace("'", "").str.zfill(5),
        pl.col('Investment Dollars').str.replace_all(',', '').cast(pl.Float64),
        state_abbr = pl.col('state_name')
            .replace(us.states.mapping('name', 'abbr')),
        region = pl.col('state_name').replace(region_dict)
    )
    .select(
        'state_name', 'state_abbr', 'County', 
        'County FIPS', 'region','Investment Dollars'
    )
)
investment_data = (
    investment_data
    .with_columns(
        region = pl.when(pl.col('region')
                .is_in(
                    ['Puerto Rico', 'Pacific', 'Central', 'Hawaii', 'Mountain', 'Alaska', 'East']))
        .then('region')
        .otherwise(pl.lit('Others'))
    )
   
)

#------------------------------------------------------------------------------#
#     Create choropleth map                                                    #
#------------------------------------------------------------------------------#
grouped_by_county = (
    investment_data.group_by('state_abbr', 'County', 'County FIPS').sum()
)
geojson_path = 'https://raw.githubusercontent.com/plotly/datasets/master'
geojson_file = 'geojson-counties-fips.json'
fig = px.choropleth(
    grouped_by_county,
    geojson=f'{geojson_path}/{geojson_file}',
    locations='County FIPS',
    color='Investment Dollars',
    color_continuous_scale="Viridis",
    scope="usa",
    title='USA Rural Investment by County - 2024',
    custom_data=['County', 'state_abbr','Investment Dollars']
)

# hover info indented to keep mouse point icon from blocking hover display
fig.update_traces(
    hovertemplate="<br>".join([
        '    %{customdata[0]} County, %{customdata[1]}',
        '   $%{customdata[2]:,d}',
    ])
)

fig =  update_layout(fig, '2024 Rural Investment by USA County', 600, 1200)

annotation = '<b>Data Source:</b> US Department of Agriculture<br><br>'
add_annotation(fig, annotation, 0.8, 1.0, 'left', 'right', 'top')

fig.show()

#------------------------------------------------------------------------------#
#     Create colormap dictionaries for both pareto charts                      #
#------------------------------------------------------------------------------#
# keys are regions, values are colors.  for the 'other' region, hardcode gray
my_colors = px.colors.qualitative.Dark24
my_regions = sorted(investment_data['region'].unique().to_list())
my_color_dict = dict(zip(my_regions, my_colors))
my_color_dict['Others'] = 'gray'

#------------------------------------------------------------------------------#
#     Pareto: data grouped by USA state                                        #
#------------------------------------------------------------------------------#
grouped_by_state = (
    investment_data
    .select('state_name', 'state_abbr', 'Investment Dollars', 'region')
    .group_by('state_name', 'state_abbr', 'region').sum()
    .sort('Investment Dollars', descending=True)
    .with_columns(
        CUM_PCT = 
        (100 * pl.col('Investment Dollars')/pl.col('Investment Dollars').sum())
        .cum_sum()
    )
    .filter(pl.col('state_name').is_in([str(s) for s in list(us.states.STATES)]))
)

fig = px.bar(
    grouped_by_state.sort('Investment Dollars', descending=True),
    x = 'state_abbr',
    y = 'Investment Dollars',
    template='plotly_white',
    color='region',
    color_discrete_map=my_color_dict,
    custom_data=['state_name','Investment Dollars', 'CUM_PCT']
)

# Set y-axes titles
fig.update_yaxes(title_text="<b>primary</b> Investment Dollars", secondary_y=False)
fig.update_yaxes(title_text="<b>secondary</b> Cumulative PCT", secondary_y=True)

# add custom hover information
fig.update_traces(
    hovertemplate="<br>".join([
        '%{customdata[0]}',
        '$%{customdata[1]:,d}',
        'Cumulative: %{customdata[2]:.1f}%',
        '<extra></extra>'
    ])
)
fig = update_layout(
    fig, 'USA Rural Investment by States & Territories - 2024', 600, 1200, 
    my_xtitle= 'State - Abbreviated', 
    my_ytitle= 'Investment (US$)',
    my_legend_title='Region',
    my_showlegend=True
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

annotation = '<b>Data Source:</b> US Department of Agriculture<br>'
annotation += 'states & territories are color coded by region<br>'
add_annotation(fig, annotation, 0.5, 0.8, 'left', 'right', 'top')

# chart gets reordered after applying colors, next line restores intended order
fig.update_xaxes(categoryorder='total descending')

fig.show()

#------------------------------------------------------------------------------#
#     Pareto: data grouped by region                                           #
#------------------------------------------------------------------------------#
grouped_by_region = (
    investment_data
    .select('region', 'Investment Dollars')
    .group_by('region').sum()
    .sort('Investment Dollars', descending=True)
    .with_columns(
        CUM_PCT = 
        (100 * pl.col('Investment Dollars')/pl.col('Investment Dollars').sum())
        .cum_sum()
    )
)

fig = px.bar(
    grouped_by_region.sort('Investment Dollars', descending=True),
    'region',
    y = 'Investment Dollars',
    template='plotly_white',
    color='region',
    color_discrete_map=my_color_dict,
    custom_data=['region','Investment Dollars', 'CUM_PCT'],
)

# add custom hover information
fig.update_traces(
    hovertemplate="<br>".join([
        '%{customdata[0]}',
        '$%{customdata[1]:,d}',
        'Cumulative: %{customdata[2]:.1f}%',
        '<extra></extra>'
    ])
)

fig = update_layout(
    fig, 'USA Rural Investment by Region - 2024', 600, 1200,
    my_xtitle='Region',
    my_ytitle='Investment (US$)',
)

fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

annotation = '<b>Data Source:</b> US Department of Agriculture<br>'
annotation += 'Regions are appoximations of time zones. Exceptions<br>'
annotation += 'made for states with multiple time zones and for<br>'
annotation += 'states that do not observe daylight saving time.<br><br>'
annotation += "<b>'Others'</b> incudes Virgin Islands, Guam, Samoa & Palua"
add_annotation(fig, annotation, 0.9, 0.8, 'left', 'right', 'top')
fig.show()

1 Like

July 30 update : Updated pareto by state to only include US States. Added region/group named ‘Others’ for regions with lowest investment levels.



4 Likes

@Mike_Purtell very nice charts. I would probably omit the legends on the second bar chart since they are not giving any extra information that has not been already provided by the chart.

4 Likes

Thank you @hebertodelrio for your suggestion, a good catch that I agree with. The legend on the second plot is surely redundant. I intended to remove it but not get to it before posting the latest version. I expect to update it in the coming week.

1 Like

@Mike_Purtell, I just noticed that your plots are missing the cumulative percentage curve

1 Like

Hi @hebertodelrio , I generally try to avoid overlaying a chart or graph with a secondary graph (and secondary axis) to maintain clarity. On the other hand, I acknowledge that a cumulative percentage to accompany the Pareto bins has value. What I did is added the cumulative percentage of each Pareto bin to thecustom hover information. I will update the code and screen shots in a few minutes. Thank you so much.

2 Likes

Beautiful. Thanks for sharing your graphs and code, @Mike_Purtell . I was able to run your code successfully to reproduce the graphs. All I needed was to install the us library. This is the first time I hear of that library – pretty useful.

2 Likes

Hey @Mike_Purtell,

I love the variety of charts you’ve created! Your analytical skills really shine through :star2:

I recently wrote a post on choosing the right colors for your data visualizations, and I think your bar charts would be a great example to try them out on!

If you upload your image to a color blindness simulator, you’ll notice that the purple, pink, red, and orange shades are difficult to differentiate for someone with red-weakness/blindness. With more than 12 color categories, it can be challenging to avoid frequently checking the legend as well.

Here are a couple of suggestions:

  • Group the color legend into 4-5 regions, like Northeast, Southwest, West, Southeast, and Midwest. This way, it’s easier to choose 4-5 easily distinguishable colors.
  • Alternatively, keep the 4-5 main groups you have and categorize everything else as “Other,” which you could keep in grey or another neutral color. For example, in your second chart, everything after Alaska doesn’t seem to have a visible bar length, so grouping them under “Other” might work well.

Hope this helps!
Li

2 Likes

Thank you for the kind words @adamschroeder. The us library is new to me as well, but definitely useful for this week’s data set.

Thank you for the kind words @li.nguyen. I will take your suggestion and make a group named ‘others’ to hold all of the smaller regions.

I thought about this earlier and decided to include all of the groups, because with plotly and you can use zoom to see them, and in some use-cases, the most interesting categories are the ones with lowest frequency. But I will do it and should have a code update in an hour or so.

I used px.colors.qualitative.Dark24 I like pre-defined sets, easire than making my own. Would like to know if there are a pre-defined color lists that are suitable for the color blind. Wondering if colors that are easier to distinguish by color blind people would also be easier to distinguish by everyone?

2 Likes

Hey @Mike_Purtell,

It’s looking much cleaner already! :broom:

You’ve made a great point about sometimes needing to highlight the lowest values, and how zooming in can help. However, relying on zooming isn’t ideal for print-outs and mobile access, where interactivity is either lost or not optimal.

To address this, you could add the values as text above the bars to highlight the lowest frequencies. This also improves readability in screenshots where hovering isn’t possible. Another option is to create a second deep-dive chart for the “Other” category only. Assuming they share a similar y-axis scale now, the bar lengths should be better visible :+1:

Regarding color palettes, I haven’t been testing out the Plotly color palettes yet. @adamschroeder might have more insights. There are many color-blind tested palettes available though; I personally like the IBM one: Coloring for Colorblindness

The Vizro discrete color palette (DISCRETE_10) has also been tested for color blindness and distinguishability, but we don’t have much variety in inbuilt color palettes yet.

Hope this helps :crossed_fingers:

2 Likes

Hi Community,
This week’s FigureFriday session (August 2 at noon Eastern Time) is cancelled because I am out of the office. The new data set for week 31 will be released tomorrow morning.

2 Likes

Hi all

I didn’t have much time this week so I modified a previous project that I’ve worked on and shared here

The app gives the user the ability to visualise the number of investments and the total monetary value for these projects per state and project area, while using both graphs as filters.

Highlighting selected states on maps (and maintaining the selection as other events take place) was especially tricky and when I was figuring it out, I didn’t find similar examples (could very well be my fault) so I’m happy with how it runed out even though the implementation is not the cleanest.

FF 30

GitHub Repo Link

6 Likes

Hi Dash guys,

My aportation this week is a dash app with tabs and radio select dynamic filters.

Features:

  • Investment Overview: Displays the total investment amount in a card.
  • Investment by Location: Interactive chart visualizing investments by state, allowing users to explore regional data.
  • Investment Segmentation: Customizable chart that lets users view investments by program area or investment type.

Live App

GitHub Project

6 Likes

Hi - i tried my hand at dash and py.cafe and added in population data - made this interactive map with investment/capita and other stats

4 Likes

Very well done @mo.elauzei

1 Like

Thank you for the kind words @li.nguyen and excellent suggestion & comments. Ultimately there is no best way to format charts like these, it all depends on the context. In the past I have used logarithmic scales on the y-axis to make all of the bars visible, but that approach also has visual limitations and complexities. On to week 31.

1 Like