Figure Friday 2024 - week 36

Update : Figure Friday 2024 - week 37 is the newer dataset.

Air quality around the globe has gone through significant ups and downs since the Industrial Revolution. Many factors affect the air that we breath such as the usage of coal power plants, clean air legislation, and automobile congestion.

Week 36 of Figure-Friday will dive into this topic with data from the Air Quality Stripes project, which shows the concentration of particulate matter air pollution (PM2.5) in cities around the world.

Things to consider:

  • can you replicate the sample graph with Plotly?
  • can you improve the sample graph built?
  • would a different figure tell the data story better?
  • are you able to replicate or improve the app built by the Air Quality Stripes project?

Sample Figure:

Helpful Resources:

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you Air Quality Stripes for the data.

1 Like

UDATE Sept 8: I have reproduced the chart-sample offered with figure friday week 36. The color stripes were implemented with fig.add_vrect. Each rectangle spans an x-range of year-0.5 to year+0.5. I used MS-Paint to clone the color mapping. The rectangles were added in a loop over each year in the data set. This solution is not very pythonic, but it works. Could surely use some optimization.

I was skeptical about the value of this visualization in real world context, but after working on this I have come to see that it is a pretty cool was to enhance a single-trace line graph and now have ideas about how to use in my real work.

The updated code produces both of the visualizations pasted below, in reverse order.

Here is a view of Bejing with storytelling. I left off the color stripes. Will try adding them as a next task.

Here is the code:

import polars as pl
import polars.selectors as cs
import plotly.express as px
import numpy as np

# colors were cloned using MS-Paint Eye Dropper tool
my_color_dict = {   
    '5 - 10' :  '#A4FFFF',
    '10 - 15':  '#B0DAE9',
    '15 - 20':  '#F9E047',
    '20 - 30':  '#F2C84B',
    '30 - 40':  '#F1A63F',
    '40 - 50':  '#E98725',
    '50 - 60':  '#AF4553',
    '60 - 70':  '#863B47',
    '70 - 80':  '#673A3D',
    '80 - 90':  '#462F30',
    '90 -   ':  '#252424',
}

#------------------------------------------------------------------------------#
#     Load the data                                                            #
#------------------------------------------------------------------------------#
c = 'Beijing, China'
df_pollution = (
    pl.scan_csv('air-pollution.csv')   # Lazy Frame
    .select(pl.col('Year', c))
    # .with_columns(color_index = (pl.col(c)/5).cast(pl.UInt8))
    .with_columns(BIN = pl.lit('UNDEFINED'))   #initialize new BIN column
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(0, 5, closed='right'))
                .then(pl.lit(' < 5')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(5, 10, closed='right'))
                .then(pl.lit('5 - 10')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(10, 15, closed='right'))
                .then(pl.lit('10 - 15')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(15, 20, closed='right'))
                .then(pl.lit('15 - 20')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(20, 30, closed='right'))
                .then(pl.lit('20 - 30')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(30, 40, closed='right'))
                .then(pl.lit('30 - 40')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(40, 50, closed='right'))
                .then(pl.lit('40 - 50')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(50, 60, closed='right'))
                .then(pl.lit('50 - 60')).otherwise('BIN')
    )  
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(60, 70, closed='right'))
                .then(pl.lit('60 - 70')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(70, 80, closed='right'))
                .then(pl.lit('70 - 80')).otherwise('BIN')
    )
    .with_columns(
        BIN = pl.when(pl.col(c).is_between(80, 90, closed='right'))
                .then(pl.lit('80 - 90')).otherwise('BIN')
    )  

    .collect() # Run Query, return Dataframe
)

def add_annotation(ig, annotation, align, xanchor, yanchor, x, xref, y, yref,   xshift=0, font_size=14):
    ''' Generic function to place text on plotly figures '''
    fig.add_annotation(
        text=annotation,
        xref = xref, x=x, yref = yref, y=y,
        align= align, xanchor=xanchor, yanchor=yanchor,
        font =  {'size': font_size, 'color': 'darkslategray'},
        showarrow=False,
        xshift = xshift
    )
    return fig

#------------------------------------------------------------------------------#
#     setup px.scatter                                                         #
#------------------------------------------------------------------------------#
fig = px.scatter(
    df_pollution,
    'Year',
    'Beijing, China',
)
my_title = 'Beijing, China<br>'
my_title += '<sup>Air pollution (PM2.5) concentrations</sup><br>'
fig.update_layout(
    template='plotly_white',
    height=800,
    width=1200,
    title=my_title,
    title_font=dict(size=24),
    yaxis_title='Annual Mean PM2.5 Concentration'.upper() + ' (μg/m<sup>3</sup>)',
    xaxis_title='',
    yaxis_title_font=dict(size=20),
    yaxis_range=[0,85],
)

customdata = np.stack(
    (
        df_pollution['Year'],     
        df_pollution['Beijing, China']
    ), 
    axis=-1
)

hovertemplate = (
    '<b>%{customdata[0]}</b><br>' + 
    'PM2.5 Concentration: %{customdata[1]:,.1f}<br>' + 
    '<extra></extra>')

fig.update_traces(
    mode='lines',
    marker=dict(size=12, line=dict(width=0)),
    customdata=customdata, 
    hovertemplate=hovertemplate
    )

#------------------------------------------------------------------------------#
#     add vertical lines to mark key timepoints, and label them                #
#------------------------------------------------------------------------------#
year=1949
y_pos=30
fig.add_scatter(
    x=[year,year], y=[0,y_pos], # vertical line based on 2-point scatter
    mode='lines',line_width=1, line_dash="dash", line_color='gray',
    showlegend=False
)
annotation = f'<b>{year}:</b><br>Establishment of PRC'
fig = add_annotation(
    fig, 
    annotation, 
    'right',   # align
    'right',   # xanchor
    'middle',  # yachnor
    xref ='x',  x= year, 
    yref = 'y', y= y_pos,
    xshift=-5
)

year=2015
y_pos=100
fig.add_scatter(
    x=[year,year], y=[0,y_pos],  # vertical line based on 2-point scatter
    mode='lines',line_width=4, line_dash="dash", line_color='green',showlegend=False
)
annotation = f"<b>{year}:</b><br>Environmental Protection<br>Law revised"
fig = add_annotation(
    fig, 
    annotation, 
    'right',   # align
    'right',   # xanchor
    'middle',  # yachnor
    xref ='x',  x= year, 
    yref = 'y', y= 20,
    xshift=-5
)

#------------------------------------------------------------------------------#
#     place descriptive annotations at various locations                       #
#------------------------------------------------------------------------------#
annotation = '<b>北京中国:</b> steady pollution increases started with the<br>' 
annotation += 'establishment of PRC. Significant drop since 2015 coincides<br>'
annotation += 'with a revised environmental protection law.<br>'
fig = add_annotation(
    fig, 
    annotation, 
    'left',   # align
    'left',   # xanchor
    'middle',  # yachnor
    xref ='paper',  x= 0.3, 
    yref = 'paper', y= 0.7,
    xshift=-5,
    font_size = 16
)

fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show()


#------------------------------------------------------------------------------#
#     setup px.scatter with color stripes, no annotations                      #
#------------------------------------------------------------------------------#
fig = px.scatter(
    df_pollution,
    'Year',
    'Beijing, China',
)
fig.update_traces(line=dict(color='white',width=6))

my_title = 'Beijing, China<br>'
my_title += '<sup>Air pollution (PM2.5) concentrations</sup><br>'
fig.update_layout(
    template='plotly_white',
    height=600,
    width=900,
    title=my_title,
    title_font=dict(size=24),
    yaxis_title='Annual Mean PM2.5 Concentration'.upper() + ' (μg/m<sup>3</sup>)',
    xaxis_title='',
    yaxis_title_font=dict(size=20),
    yaxis_range=[0,85],
)

customdata = np.stack(
    (
        df_pollution['Year'],     
        df_pollution['Beijing, China']
    ), 
    axis=-1
)

hovertemplate = (
    '<b>%{customdata[0]}</b><br>' + 
    'PM2.5 Concentration: %{customdata[1]:,.1f}<br>' + 
    '<extra></extra>')

fig.update_traces(
    mode='lines',
    customdata=customdata, 
    hovertemplate=hovertemplate,   
    )

#------------------------------------------------------------------------------#
#     add a box of width 1 above each year, use color_dict for shading value   #
#------------------------------------------------------------------------------#
for year in df_pollution['Year'].to_list():
    my_bin = df_pollution.filter(pl.col('Year') == year).select(pl.col('BIN')).to_series()[0]
    my_color = my_color_dict.get(my_bin)
    fig.add_vrect(
        x0=year-0.5, x1=year+0.5,
        fillcolor=my_color, #  opacity=0.5,
        layer="below", 
        line_color=my_color,
    )

fig.data = fig.data[::-1]
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show()

5 Likes

My first Figure Friday submission!
I transformed the dataset and added additional data for the scatter map. It is very different from the sample figure though.

5 Likes

Nice job, @Mike_Purtell . I also prefer how it looks with the added vertical colored lines. Quick question: what does the .otherwise('BIN') do in the polars line of code?

Side note: it would be cool to give the user an option to observe other cities with a dropdown. I think it’s possible with Plotly dropdown menus, but it’s probably easier to implement with Dash.

Wow, @deepa-shalini What an amazing first Figure-Friday submission. Where have you been hiding all this time?! :hugs:

I like the addition of Dash AG Grid with the health categories. I would recommend updating the column header from 2021 to US AQI Level

This is a small suggestion: for grammatical consistency, I would update the left tab to Cities with the LOWEST PM2.5 because the right tab uses the word HIGHEST.

You could also probably remove the lat/lon info in the map’s hoverbox, unless you think it’s important or the user to know that.

How did you get the lat/lon of every city? Did you manually look them up?

Hi @adam ,

.otherwise(‘BIN’) keeps the existing value of the BIN column, it is essentially a NOP. I like the suggestion to try plotly dropdown menus, I will give it a try. Thank you.

1 Like

Thanks @adamschroeder for your suggestions. I have incorporated them. The app is updated.

As for retrieving the coordinates of the cities, I used the Nominatim API.
Here is a small example.

# Import the required library
from geopy.geocoders import Nominatim

# Initialize Nominatim API
geolocator = Nominatim(user_agent="AirQualityApp")

location = geolocator.geocode("Bangalore")

print("The latitude of the location is: ", location.latitude)
print("The longitude of the location is: ", location.longitude)

I found 4-5 mistakes with the output generated for the air-quality.csv dataset, I just corrected those manually.

3 Likes

@deepa-shalini , what a great job you have done on this week’s Figure Friday, and it is only your first submission. Hope you will stick around in this community to contribute work like this and offer your comments to others. I have been here for about 6 weeks, and find that every time I submit a visualization, I get back super valuable suggestions, and very kind remarks. This improves my work quality and makes me happy, and hopefully the same will come to you.

3 Likes

Thanks @deepa-shalini . I wasn’t aware of Nominatim. I’m sure I’ll be using this in the future :slight_smile:

Appreciate it.

2 Likes

Thank you for your kind words of encouragement @Mike_Purtell I am very excited to be part of the Figure-Friday sub-community.

1 Like

Smooth dashboard @deepa-shalini !

Actually I was thinking in building similar charts, I think because they answer two important questions: how is the air pollution around the world and what are the most and least polluted cities.

PD: a good challenge could be using google cloud for deploy your next app :smiley:

2 Likes

Updated September 13

Hi everyone,

My contribution this week is a Dash app with the following figures:

  • Line Chart: Visualize trends over time for every country.
  • Area Chart: Show the count of cities in selected air quality category over time.
  • Dash Table: Display the records for selected year and air quality category.

P.D: Thanks @adamschroeder for the feedback, definitively keeping the horizontal lines to the same color enhances readibility and using area chart instead of tree map makes easier to see the trend over time.

Dash App

2 Likes

Here are the air pollution curves with color striping for a few interesting cities. The code produces one graph for each city in the data set by looping over 176 cities and displaying them serially. Have not implemented the pull-down menu to select the city.

It took 52 minutes to generate plots for every city. Used .add_vrect to put 172 vertical boxes above every year, with color based on pollution level. An expensive annotation time wise, my take is these box annotations are not natural steps in the graph creation and just take more time than one might expect. I could not think of any other way to produce these graphs.

My question to the community, are there other ways to complete this task that would run faster? This is my curiosity, would love to hear what others have to say and would not be surprised if some pointed out something in my code that could have made it run faster.

I am eager to use plots in my work, regardless of speed.

Here are a few samples:

Here is the code

import polars as pl
import polars.selectors as cs
import plotly.express as px
import numpy as np

# colors were cloned using MS-Paint Eye Dropper tool
my_color_dict = {   
    '   - 10':  '#A4FFFF',
    '10 - 15':  '#B0DAE9',
    '15 - 20':  '#F9E047',
    '20 - 30':  '#F2C84B',
    '30 - 40':  '#F1A63F',
    '40 - 50':  '#E98725',
    '50 - 60':  '#AF4553',
    '60 - 70':  '#863B47',
    '70 - 80':  '#673A3D',
    '80 - 90':  '#462F30',
    '90 -   ':  '#252424',
}

#------------------------------------------------------------------------------#
#     Load the data                                                            #
#------------------------------------------------------------------------------#

df_pollution = (
    pl.scan_csv('air-pollution.csv')   # Lazy Frame
    .collect() # Run Query, return Dataframe
)
city_list = df_pollution.select(pl.all().exclude('Year')).columns

for i, c in enumerate(city_list):
    print(f'City {i+1} of {len(city_list)}')
    df_city = (
        pl.LazyFrame(df_pollution)    # Lazy Frame
        .select(pl.col('Year', c))
        # .with_columns(color_index = (pl.col(c)/5).cast(pl.UInt8))
        .with_columns(BIN = pl.lit('UNDEFINED'))   #initialize new BIN column
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(0, 10, closed='right'))
                    .then(pl.lit('   - 10')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(10, 15, closed='right'))
                    .then(pl.lit('10 - 15')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(15, 20, closed='right'))
                    .then(pl.lit('15 - 20')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(20, 30, closed='right'))
                    .then(pl.lit('20 - 30')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(30, 40, closed='right'))
                    .then(pl.lit('30 - 40')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(40, 50, closed='right'))
                    .then(pl.lit('40 - 50')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(50, 60, closed='right'))
                    .then(pl.lit('50 - 60')).otherwise('BIN')
        )  
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(60, 70, closed='right'))
                    .then(pl.lit('60 - 70')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(70, 80, closed='right'))
                    .then(pl.lit('70 - 80')).otherwise('BIN')
        )
        .with_columns(
            BIN = pl.when(pl.col(c).is_between(80, 90, closed='right'))
                    .then(pl.lit('80 - 90')).otherwise('BIN')
        )  
        .with_columns(
            BIN = pl.when(pl.col(c) > 90)
                    .then(pl.lit('90 -   ')).otherwise('BIN')
        )  
        .collect() # Run Query, return Dataframe
    )

    def add_annotation(ig, annotation, align, xanchor, yanchor, x, xref, y, yref,   xshift=0, font_size=14):
        ''' Generic function to place text on plotly figures '''
        fig.add_annotation(
            text=annotation,
            xref = xref, x=x, yref = yref, y=y,
            align= align, xanchor=xanchor, yanchor=yanchor,
            font =  {'size': font_size, 'color': 'darkslategray'},
            showarrow=False,
            xshift = xshift
        )
        return fig

    #------------------------------------------------------------------------------#
    #     setup px.scatter with color stripes, no annotations                      #
    #------------------------------------------------------------------------------#
    fig = px.scatter(
        df_pollution,
        'Year',
        c,
    )
    fig.update_traces(line=dict(color='white',width=6))

    my_title = f'{c}<br>'
    my_title += '<sup>Air pollution (PM2.5) concentrations</sup><br>'
    fig.update_layout(
        template='plotly_white',
        height=600,
        width=900,
        title=my_title,
        title_font=dict(size=24),
        yaxis_title='Annual Mean PM2.5 Concentration'.upper() + ' (μg/m<sup>3</sup>)',
        xaxis_title='',
        yaxis_title_font=dict(size=20),
        yaxis_range=[0,130],
    )

    customdata = np.stack(
        (
            df_pollution['Year'],     
            df_pollution[c]
        ), 
        axis=-1
    )

    hovertemplate = (
        '<b>%{customdata[0]}</b><br>' + 
        'PM2.5 Concentration: %{customdata[1]:,.1f}<br>' + 
        '<extra></extra>')

    fig.update_traces(
        mode='lines',
        customdata=customdata, 
        hovertemplate=hovertemplate,   
        )

    #------------------------------------------------------------------------------#
    #     add a box of width 1 above each year, use color_dict for shading value   #
    #------------------------------------------------------------------------------#
    for year in df_pollution['Year'].to_list():
        my_bin = df_city.filter(pl.col('Year') == year).select(pl.col('BIN')).to_series()[0]
        my_color = my_color_dict.get(my_bin)
        fig.add_vrect(
            x0=year-0.5, x1=year+0.5,
            fillcolor=my_color, #  opacity=0.5,
            layer="below", 
            line_color=my_color,
        )

    fig.data = fig.data[::-1]
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(showgrid=False)
    fig.show()

Nice one, @Alfredo49

I really like the line chart you created that’s tied to the dropdown.

For me personally, the amount of colors make it harder to read. I wonder how it would look if the horizontal health category levels were all black; or maybe one continuous color like light pink, pink, light red, red.

Is there a reason you prefer to use the DataTable over Dash AG Grid?
Above the table you have the dropdown, but the order of unhealthy and unhealthy for sensitive groups is reversed.

The tree map comparing city count by air quality is a good idea. Although I think you might get a better reading of change over time with the Plotly filled area plot.

1 Like

wow, that’s a long time, @Mike_Purtell

Do you think it is more of a scalability issue or even if you did one plot it would take a long time?

It took 52 minutes for 172 plots, meaning that it took roughly 18 seconds per plot. Is that how long it takes you if you were to do only one plot?

1 Like

Hi @adamschroeder, it is scalability to a degree. While the average was 18 seconds per plot, when only plotting 1 city (Beijing), it took just 10 seconds. Thank you.