Figure Friday 2024 - week 48

join the Figure Friday session on December 6, at noon Eastern Time, to showcase your creation and receive feedback from the community.

Did you know that in 2018 98% of the population in Bahrain was using the internet, while in Brazil it was 70% and in Bolivia it was 44%?

In this week’s Figure-Friday we’ll look at the Worldbank’s data on Individuals using the Internet (as a % of the population). It’s important to note that internet users are defined as individuals who have used the Internet (from any location) in the last 3 months. (The Internet can be used via a computer, mobile phone, personal digital assistant, games machine, digital TV etc.)

Things to consider:

  • can you improve the sample figure below (line chart docs)?
  • would a different figure tell the data story better?
  • can you create a Dash app instead?

Sample figure:

Code for sample figure:
import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-48/API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")
df_filtered = df[df["Country Name"].isin(["Angola", "Albania", "Andorra", "Argentina"])]

melted_data = pd.melt(
    df_filtered,
    id_vars=['Country Name'],
    var_name='Year',
    value_name='Quantity'
)

melted_data['Year'] = pd.to_numeric(melted_data['Year'], errors='coerce')
# Drop rows where 'Year' is NaN (non-year columns) or 'Quantity' is NaN
melted_data = melted_data.dropna(subset=['Year', 'Quantity'])
print(melted_data)

fig = px.line(melted_data, y="Quantity", x="Year", color="Country Name", markers=True)
fig.update_layout(hovermode='x unified')
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to the WorldBank for the data.

1 Like

:wave: Hi, I was looking for a number to rank the countries according to its growth indicator, and I couldn’t find one yet, any clue?
In the meantime, just picking some random year from 1992 to 2023 to display top5 and bottom5. Keep working…

Code
"""Just importing modules"""
from dash import Dash, dcc, html, Input, Output
import dash_bootstrap_components as dbc
import plotly.express as px
import pandas as pd
import numpy as np
import random

np.set_printoptions(suppress=True)

# df_arg = pd.read_csv(r"C:\Users\Juan Aguirre\Downloads\Book2.csv")
df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-48/API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")

exclude_cols = ['Country Code', 'Indicator Name', 'Indicator Code']
# df[df.columns.difference(exclude_cols, sort=False)]

df_melted = pd.melt(
    df[df.columns.difference(exclude_cols, sort=False)],
    id_vars=['Country Name'],
    var_name='Year',
    value_name='Quantity'
)

df_melted['Year'] = pd.to_numeric(df_melted['Year'], errors='coerce')
# Drop rows where 'Year' is NaN (non-year columns) or 'Quantity' is NaN
df_melted = df_melted.dropna(subset=['Year', 'Quantity'])

df_country_year_idxd = (df_melted.set_index(keys=['Country Name', 'Year']))
df_year_idxd = (df_melted.set_index(keys=['Year']))
# Initialize the Dash app with Bootstrap theme
app = Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])

# Layout definition with Bootstrap components
button = html.Div(
    [
        dbc.Button("Pick Random Year", id="submit-button", className="btn btn-primary me-2", n_clicks=0),
        html.Span(id="selected-year-output", style={"verticalAlign": "middle"}),
    ]
)

app.layout = dbc.Container(
    fluid=True,
    children=[
            dbc.Row([
                dbc.Col(
                    html.H3('Top and Bottom 5 countries by Internet Growth',  className="text-start text-primary mb-2"),
                    #width={'size': 8, 'offset': 2}
                ),
            ]),
            dbc.Row([
                dbc.Col(
                    button,
                    # width={"size": 2, "offset": 5}
                ),
            ]),
            dbc.Row([
                dbc.Col(dcc.Graph(id="top-5-graph", figure={}), width=6),
                dbc.Col(dcc.Graph(id="bottom-5-graph", figure={}), width=6)
            ]),
    ]
)

# Callback to handle random year selection and graph updates
@app.callback(
    [Output("selected-year-output", "children"),
     Output("top-5-graph", "figure"),
     Output("bottom-5-graph", "figure")],
    [Input("submit-button", "n_clicks")]
)
def update_graphs(n_clicks):
    if n_clicks == 0:
        return "Click the button to pick a random year.", {}, {}

    # Randomly select a year between 1992 and 2023
    selected_year = random.randint(1992, 2023)

    # Filter data for the selected year
    filtered = df_year_idxd.loc[selected_year]#.nsmallest(5, columns='Quantity')

    # Get Top 5 countries by Quantity
    top_5 = filtered.nlargest(5, "Quantity")

    # Get Bottom 5 countries by Quantity
    bottom_5 = filtered.nsmallest(5, "Quantity")

    # Create Top 5 bar plot
    fig_top_5 = px.bar(top_5, y="Country Name", x="Quantity", title=f"Top 5 Countries in {selected_year}",
                       text_auto=True, color_discrete_sequence=["#FF7F0E"], orientation='h', template='plotly_white',
                       labels={'Country Name': '', 'Quantity': ''})

    # Create Bottom 5 bar plot
    fig_bottom_5 = px.bar(bottom_5, y="Country Name", x="Quantity", title=f"Bottom 5 Countries in {selected_year}",
                          text_auto=True, color_discrete_sequence=['#BCBD22'], orientation='h', template='plotly_white',
                          labels={'Country Name': '', 'Quantity': ''})

    return f"Selected Year: {selected_year}", fig_top_5, fig_bottom_5


if __name__ == "__main__":
    app.run_server(debug=True, jupyter_mode='external')
2 Likes

oh, I like the random year button, it’s like an addicting game :slight_smile:

What do you mean by:

number to rank the countries according to its growth indicator

:slightly_smiling_face:… I tried with some formula over ‘Quantity’ to rank Countries according to growth penetration, like Year-over-year pd.pct_change() o some other more complex like CAGR (Compound Annual Growth Rate), but the results wasn’t that accurate as I expected despite they were some kind reasonable, like Cuba with the largest growth rate. Anyway, mi initial idea was try to rank the countries the most accurately way possible and plot those metrics for comparison.
I guess, I will keep with Year-over-year metric and go that way.

I love this pandas function !

:wave: Hello! I tried to find the top and bottom 5 countries based on average internet usage over the past decade. I attempted to add a ‘%’ symbol to the y-axis in the top 5 graph, but it kept messing up the values (would display them as x.0000%). :thinking: As a result, I decided to leave the y-axis without the percentage symbol for now. If you have any tips on how to fix this, I’d appreciate your suggestions! :slight_smile:


Code
import pandas as pd
import plotly.express as px
import plotly.io as pio

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-48/API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")
df.head()

melted_data = pd.melt(
    df,
    id_vars=['Country Name'],
    var_name='Year',
    value_name='Quantity'
)

melted_data['Year'] = pd.to_numeric(melted_data['Year'], errors='coerce')
melted_data = melted_data.dropna(subset=['Year', 'Quantity'])
print(melted_data)

latest_year = melted_data['Year'].max()
df_last_10_years = melted_data[melted_data['Year'] >= latest_year - 10]

average_quantity = df_last_10_years.groupby('Country Name')['Quantity'].mean().reset_index()

average_quantity_sorted = average_quantity.sort_values('Quantity', ascending=False).head(5)

fig = px.bar(average_quantity_sorted, 
             x='Country Name', 
             y='Quantity', 
             title='Top 5 Countries with the Highest Average Internet Users (as % of population) in the Last Decade',
             labels={'Quantity': 'Average Percentage of Population', 'Country Name': 'Country'},
             color='Quantity')

fig.update_layout(showlegend=False)

fig.update_layout(xaxis_tickangle=-45, yaxis=dict(range=[average_quantity_sorted['Quantity'].min() - 0.5, 100]))

fig.show()

average_quantity_sorted = average_quantity.sort_values('Quantity', ascending=True).head(5)

average_quantity_sorted['Quantity'] = average_quantity_sorted['Quantity'] / 100

fig2 = px.bar(average_quantity_sorted, 
              x='Country Name', 
              y='Quantity', 
              title='Bottom 5 Countries with the Lowest Average Internet Users (as % of population) in the Last Decade',
              labels={'Quantity': 'Average Percentage of Population', 'Country Name': 'Country'},
              color='Quantity')

fig2.update_layout(showlegend=False)

fig2.update_layout(
    yaxis=dict(
        range=[0, average_quantity_sorted['Quantity'].max() + 0.05],
        tickmode='array',
        tickvals=[i / 100 for i in range(0, int(average_quantity_sorted['Quantity'].max() * 100) + 1, 1)],
        ticktext=[f'{i}%' for i in range(0, int(average_quantity_sorted['Quantity'].max() * 100) + 1, 1)],
        tickformat='%'
    )
)

fig2.update_layout(xaxis_tickangle=-45)

fig2.show()