Figure Friday 2024 - week 35

Update : Figure Friday 2024 - week 36 is the newer dataset.

Week 35 of Figure-Friday will accompany Plotly’s transition to MapLibre, bringing faster performance and increased stability to all of our map-type charts. Read the blog post written by Plotly’s very own, @nathandrezner, to learn more about the why, what, and when.

The data set we’ve chosen is the people-map.csv file, which represents the most visited person(s) Wikipedia page for each city or town from 2015-2019.

Helpful resources:

Things to consider:

  • can you improve the sample map built by The Pudding?
  • would a different visualization tell the data story better?
  • what would a Dash app look like?

Sample Figure:

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to The Pudding for the data.

2 Likes

Updated Sept 5

  • set color & size based on the squared log10 value of views_sum, with normalization to force the minimum value to 1. For example. Charles R. Gleason has views_sum of 300 (lowest value), color & size values are 1. By contrast, Barack Obama has a views sum of 44,897,089, color & size values are 38.1.
  • added color/size factor as a hover value, mostly for developer debug
  • Added name count as a hover value as many names appear in the dataset more than once.
  • changed the hover font to ‘courier’. Not my favorite font, but usefulf for right justification of numbers

Updated Sept 2
Inspired by always helpful suggestions from @li.nguyen and @adamschroeder that I am very grateful for, I made the following changes:

  • map type is ‘streets’ with a light color background. Removed the iteration over all 15 map types.
  • Removed the square root function that was put in to reduce the range between the smallest and largest view counts
  • color and size are both referenced to views_sum (kind of redundant)
  • used Magenta_r sequential color map. The _r reverses the scale, so that the darkest colors are used for the very hard to spot low view counts.
  • Added average daily views (views_median??), and total views to the hover info.

Biggest challenge for me was arranging the colors so that the low view count markers are visible, without having the high view counts take up excessive area.

I really enjoy this data set, and by using it I solved an issue that has puzzled me for some time.

Columns with long text strings (see the extract column) are challenging to use as hover info because there is no capability I know of to automatically word wrap them. I solved with a function that replaces whitespaces with html line feeds when the desired max line length has been exceeded.

Here are a few screen shots:
Charles R. Gleason, with the lowest number for views_sum

Barack Obama, with very high value of views_sum

Here is the code

import polars as pl
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook_connected"

# us library is a tool for working with US State names and abbreviations
# make a list of valid US states, and filter data with it.
import us
state_list = [s for s in us.states.mapping('abbr', 'name').values()]


def wrap_hover(text, chars_per_line=28):
    '''
    break long hover text into multiple lines, split with html line feeds.
    1st whitespace after chars_per_line value is exceeeded is replaced with <br>
    '''
    result = []
    
    # Counter to track line_Length
    line_length = 0
    
    # Iterate over each character in the text
    for char in text:
        line_length += 1
        if char.isspace():
            if line_length > chars_per_line:
                result.append('<br>')
                line_length = 0
            else:
                result.append(char)
        else:
            result.append(char)
    
    return ''.join(result)

def state_abbr(state):
    '''
    return commonly used abbreviation of full state name, ie New York, NY
    '''
    return us.states.lookup(state).abbr

#------------------------------------------------------------------------------#
#     scan_csv produces polars Lazy frame, with data cleaning flow             #
#------------------------------------------------------------------------------#
data_set = (
    pl.scan_csv('people-map.csv')  # LazyFrame

    .with_columns(pl.col('views_median', 'views_sum').cast(pl.Int32))
    # call wrap_hover function to inserts line feeds
    .with_columns(
        extract_wrap = 
            pl.col('extract')
            .map_elements(wrap_hover, return_dtype=pl.String)
    )
    # add column to count # of times each name appears in the dataset
    .with_columns(
        NAME_COUNT = pl.col('name_clean').count().over('name_clean')
    )

    # Filter out rows with invalid state entries
    .filter(pl.col('state').is_in(state_list))

     # Add column with abbreviated form of each state's name, ie NY for New York   
    .with_columns(
        STATE_ABBR = pl.col('state').map_elements(state_abbr, return_dtype=pl.String)
    )

    # tweak for Washington DC, state
    .with_columns(
        state = pl.when(pl.col('city').str.ends_with('D.C.'))
           .then(pl.lit('D.C.'))
           .otherwise('state')
    )
    
    # tweak for Washington DC, city
    .with_columns(
        city = pl.when(pl.col('city').str.ends_with('D.C.'))
           .then(pl.lit('Washington'))
           .otherwise('city')
    )
    # exclude rows with views_median == 0, or views_sum is null
    .filter(pl.col('views_median') > 0)
    .filter(pl.col('views_sum').is_not_null())
    .with_columns(
        color_size = (1+ pl.col('views_sum').log10() - pl.col('views_sum').log10().min()).pow(2).round(1)
    )
    
    .collect()   # optimize and execute this query, return a regular dataframe
)

#------------------------------------------------------------------------------#
#     scatter_map uses map type 'streets' with Magenta_r sequential colors     #
#------------------------------------------------------------------------------#
fig = px.scatter_map(
    data_set,
    lat='lat',
    lon='lng',
    size='color_size', 
    color='color_size',
    color_continuous_scale='Magenta_r',
    size_max=35,
    zoom=3,
    map_style='streets',
    custom_data=[
        'name_clean',              #  customdata[0]
        'city',                    #  customdata[1]
        'STATE_ABBR',              #  customdata[2]
        'views_median',            #  customdata[3]
        'views_sum',               #  customdata[4]
        'NAME_COUNT',              #  customdata[5]
        'color_size',              #  customdata[6]
        'extract_wrap',            #  customdata[7]
    ],
    range_color=(0,30)  # max log of 7 means 10e6)
)

fig.update_layout(
    autosize=True,
    width=1300,
    height=600,
    margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ),
)
#------------------------------------------------------------------------------#
#     Apply hovertemplate                                                      #
#------------------------------------------------------------------------------#
fig.update_traces(
    hovertemplate =
        '<b>%{customdata[0]}: %{customdata[1]}, %{customdata[2]}<br></b>' +
        '<b>Daily Views:</b>%{customdata[3]:>20,}<br>' +
        '<b>Total Views:</b>%{customdata[4]:>20,}<br>' +
        '<b>Name Count: </b>%{customdata[5]:>20}<br>' +   
        '<b>Color/Size Factor: </b>%{customdata[6]:>13}<br><br>' +   
        '%{customdata[7]}<br>' +
        '<extra></extra>'
)

fig.update_layout(
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family='courier',  # 'sans-serif mono', # 'courier new', 
    )
)


fig.update_layout(
    margin={"r":0, "t":0, "l":0, "b":0},
    )

fig.write_html(f'Fig_Fri_Week_35_Map.html')
fig.show()

#------------------------------------------------------------------------------#
#     For Data Exploration, histogram of views_sum                             #
#------------------------------------------------------------------------------#
fig = px.histogram(
    data_set, # sqrt()),
    x='views_sum',
    nbins=1000
)
fig.update_layout(width=800, height=400, template='plotly_white')
fig.show()

#------------------------------------------------------------------------------#
#     For Data Exploration, histogram of NAME_COUNT                            #
#------------------------------------------------------------------------------#
fig = px.histogram(
    data_set, # sqrt()),
    x='NAME_COUNT',
)
fig.update_layout(width=800, height=400, template='plotly_white')
fig.show()

#------------------------------------------------------------------------------#
#     Show min and max values of NAME_COUNT, SIZE                              #
#------------------------------------------------------------------------------#

print(
    data_set
    .select(pl.col('name_clean', 'views_sum', 'color_size'))
    .sort('views_sum', descending=False)
    .filter(pl.col('name_clean').is_in(['Barack Obama', 'Charles R. Gleason']))
    .unique('name_clean')
)
2 Likes

UPDATE TO MY PREVIOUS POST:
Two issues mentioned by me were 1) VS-Code would not show any maps, and 2) only a subset of maps worked .

Both issues are solved by adding these 2 lines of code:

import plotly.io as pio
pio.renderers.default = "notebook_connected"

I have updated the code and removed these points from the original post.

2 Likes

Hey @Mike_Purtell,

This looks fantastic! :art: I personally prefer your version with the simpler/white map style; your colours and bubbles really stand out there. Great job with the automatic text wrapping and the detailed tooltips you always put so much care into!

I have just one minor suggestion: consider switching from the current red-blue diverging colour sequence to a sequential one :slight_smile: I can recommend this article - it explains the differences in terms of data perception when you choose a diverging vs. a sequential colour sequence.

I think a sequential palette could be more effective here because:

  1. I think the midpoint here, count of 100, doesn’t actually have a meaning e.g. it doesn’t represent any neutrality, so assigning it a colour of white might be a bit confusing.
  2. Since the focus is on names with the most views, highlighting higher values only would make the story clearer. The diverging sequence emphasizes both low and high values, but a sequential palette would better highlight the higher values, aligning with the data story :slight_smile:

You could even just provide it a single colour (as The Pudding has done it), because you already use the dimension of the bubble size to represent high/low values :slight_smile:

Keep up the great work! :rocket:

2 Likes

views_sum in the figure-friday week 35 data set has values ranging from less than 100 to 140M, quite a wide dynamic range. It is helpful to look at the distribution of these values and see if it makes sense to have colors scaled across the full data set, or only to a subset.

In this data set, only 444 out of 9339 entries (4.75%) have views_sum values of 10M or more. If range_color is not used, this small number of entries consumes 13/14 (93%) of the color scale.

In my updated post I set range_color to (0, 10M), so all values of 10M or more get the same color, and the full range of the color_scale is applied more evenly to the 95.25% of entires with views_sum values of 10M or less.

Appreciate any comments or feedback on using range_color in this way.

Here is a histogram of views_sum to show what I mean.

image

1 Like

This makes a lot of sense to me ,@mike_purtell . If you hadn’t done this, it would be challenging to distinguish those points under 10 million. I remember you had also mentioned trying logarithmic scale. How did that turn out? Did it make sense plotting the points size with that scale in mind?

1 Like

Hi @Mike_Purtell
Thank you for bringing this up and sharing your solution. My colleague, @liamc , has mentioned that VS Code uses its own version of Plotly.js, which means that the latest features rarely work immediately.

Hi @adamschroeder , I just posted a code update & screen shots using a squared log10 value of views_sum, and normalization to force the minimum value to 1. The lowest value of views_sum was 300, which gets a size parameter of 1. By contrast a very high views_sum of 44,897,089 gets size of 38.1. Easy to see on the screen shots for both of these have distinct sizes and both are visible. The log function seems like a decent way to manage this.

1 Like

hi @Mike_Purtell your project is real nice and insightful that without user hover he or she can extract information from your visualization.

1 Like

Thank you @Moritus

Hi community,

For this week I created a Dash app with:

  • Scatter map for visualize the most visited person by city and by state using tabs selector for grouping.
  • Bar charts showing the top visited people for every city and state using dropdown selectors for filtering

Dash App

2 Likes

Nicely done @Alfredo49, great job. I made the map with marker sizes set by sum_views column, and noticed that your markers are all the same size. After seeing what you have done, I like your approach much better. This project creates a crowded display, just no way around that. Your work shows me that variable size markers like I used adds more noise to the display. Variable color is a more effective way to visually distinguish the popularity behind each marker. One minor tweak to would be to use px.scatter’s range_color parameter to force visible color variation over the bulk of the data, without unque color for the high side outliers.

1 Like

I was going to say something similar. I like how you built your map @Alfredo49 . The only challenge I faced was seeing popular names (light blue) in highly dense cities. For example, if you zoom out a little, it’s hard to see Barack Obama as the most viewed name in Chicago because there are so many other markers around his.

The horizontal bar charts for states and towns/cities is a great idea; a fun way to explore the data.

2 Likes

Thanks for the feedback @Mike_Purtell & @adamschroeder :smiley:

As you mentioned, the data is very saturated, so I created the bar charts to visualize the data in a more granular way.

I think this was a good first try with scatter_map and will definitely look into the range_color parameter.

2 Likes

@Alfredo49 , to echo @adamschroeder 's comment, the bar charts by city and state are terrific way for users of the app to drill down. It really helps that your app sorts these charts. With just the map, it is a bit tedious to find equivalent information in any geographic region. Well done.

Hi guys!

A little late, but I’ve just deployed my app for W35 :tada:
https://sdidier-dev.freeboxos.fr/sdidier-dev/figure-friday/W35
You can also take a look on the apps of the previous weeks :grin:
(few are missing, including the home page, but hopefully they will come soon!)
Note that the apps are responsive and compatible with all available Bootstrap themes and light/dark mode.

Edit: I forgot the link to the code:

It uses a map obviously and a grid to have details that can also be used to filter the data displayed on the map.

Disclamer: Sorry if it is long to load, it is hosted on a Raspberry Pi in France :innocent:

3 Likes

Thanks for sharing your code and app, @Skiks . I really like the map and marker colors you chose. The yellow stands out.

A clear improvement over the the sample figure is the clickable legend you added with the total_views.

2 Likes

This is great! :rocket: I really like the ability to hide labels and markers.

I also agree with @adamschroeder - the bubble size legend is great! I’ve always been curious about how to create something like that. To my knowledge, Plotly charts don’t automatically generate bubble legends when using the size argument inside a chart, so I’m interested in knowing how you worked around that! :slight_smile:

2 Likes

Thanks @adamschroeder, @li.nguyen !

Actually, the trick is to make buckets to have categorical-like data, instead of continuous data.
Then plot one trace by bucket setting a constant marker size by trace, but different for each trace.
I also did the same for the labels size and finally set the same color for all traces so that looks like there is only one trace.
And Plotly creates the legend for you, that you can also use to easily filter the traces to display, if you want only the +10,000,000 data points, double-click on it on the legend, and voila! :grin:

For the marker color, actually, it is linked to the current theme, it’s the primary color of the theme.
Here, yellow for the Solar theme, my favorite (I’m more on the dark side :innocent:)
You can change the theme on the top right to see the result!

2 Likes