Figure Friday 2024 - week 48

For the week 48, I looked the internet distribution using choropleth map with the ability to filter the data based on year selection using mantine YearPickerInput. I also add two bar chart, one for the average internet usage by region and the other for IncomeGroup, with the ability to filter by year for both. Finally, I use line chart to show the trend of internet usage worlwide from 1960 to 2023.


Here is my code :

from dash import Dash, html, _dash_renderer, dcc
from dash.dependencies import Output, Input
import dash_mantine_components as dmc
_dash_renderer._set_react_version("18.2.0")

import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template

from datetime import datetime, timedelta
from dash_iconify import DashIconify

import pandas as pd 



df = pd.read_csv("Metadata_Country_API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")
df2 = pd.read_csv("API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")
internet_data = df.merge(df2)

internet_data = internet_data.melt(id_vars=internet_data.columns[:8],
                   value_vars= internet_data.columns[8:],
                   var_name= "Year",
                   value_name= "% of Internet Usage"
                  )

internet_data["Year"] = internet_data["Year"].astype("int16")

internet_data = internet_data.dropna(subset="% of Internet Usage") # removing all raws with NA specifically in this columns

internet_data_countries = internet_data.loc[internet_data["Country Name"] != "World"]
internet_data_world = internet_data.loc[internet_data["Country Name"] == "World"]

app = Dash(__name__, external_stylesheets=[dbc.themes.VAPOR] + [dmc.styles.DATES]) # .ALL , here I only want for the dates
load_figure_template("VAPOR")

app.layout = dmc.MantineProvider(
    children=[ 
        dmc.Center("Global Internet Usage: Percentage of Population (historical data from 1960 to 2023, WorldBank)",
                    style={"height": 80, "width": "100%","backgroundColor": "#990066",
                           "fontSize": "2rem", "fontFamly": "Arial", "color": "white"
                          },
                  ),
        html.Br(),
        html.Div(
            style = {"width": "20%"},
           children= [
               dmc.YearPickerInput(
                   id = "years-selection",
                   leftSection=DashIconify(icon="fa:calendar"),
                   minDate=datetime(1960, 1, 1),
                   maxDate=datetime(2023, 1, 1),
                   type="single", #multiple for multiple selection
                   label="Pick Year",
                   clearable = True,
                   placeholder="Pick date",
                   labelProps={"style": {"fontSize": "1.5rem"}},
                    styles={
                        "calendarHeaderLevel": {"color": "blue"  },
                        "placeholder":{"color": "black"}
                    },
               ),
           ]),
    
        dbc.Col(dcc.Graph(id="map",style={"height": "80vh", "width": "80vw"} ), width=9),
        dbc.Row([
            dbc.Col(dcc.Graph(id="region"), width=6),
            dbc.Col(dcc.Graph(id="incomegroup"), width=6),
        ]),
        
        html.Br(),
        dcc.Graph(
            figure = px.line(
                internet_data_world,
                x= "Year",
                y= "% of Internet Usage",
                title= "Internet Usage Trend in the World from 1960 to 2023"
            )
        )
    
   
        
      
    ]
)

@app.callback(
   [
        Output(component_id="map", component_property="figure"),
        Output(component_id="region", component_property="figure"),
        Output(component_id="incomegroup", component_property="figure"),
   ],
    Input(component_id="years-selection", component_property="value")
)
def plot_graph(years):
    if not years:
        avg_internet_percent = (internet_data_countries.groupby(["Country Name"], as_index=False)
                                   .agg({"% of Internet Usage" : "mean"})
                                  )
        fig = px.choropleth(
            data_frame= avg_internet_percent,
            locations= "Country Name",
            color="% of Internet Usage",
            locationmode= "country names",
            scope= "world",
            color_continuous_scale=px.colors.plotlyjs.Rainbow_r,
            title= "Individuals using the Internet (% of population) from 1960 to 2023 - Avg"
        )
        
        avg_internet_region =  (internet_data_countries
                                .groupby("Region", as_index=False)
                                .agg({"% of Internet Usage": "mean"})
                               )
        avg_internet_incomegroup =  (internet_data_countries
                                     .groupby("IncomeGroup", as_index=False)
                                     .agg({"% of Internet Usage": "mean"})
                                    )
        
        bar_region = px.bar(avg_internet_region,
                            x= "Region",
                            y= "% of Internet Usage",
                            title= "Average Percentage of Internet Usage by region to 1960 to 2023"
                           )
        
         
        bar_incomegroup = px.bar(avg_internet_incomegroup,
                            x= "IncomeGroup",
                            y= "% of Internet Usage",
                            title= "Average Percentage of Internet Usage by IncomeGroup to 1960 to 2023"
                           )
        
        
    else:
        selected_year = int(years[:4])
        internet_data_mask = internet_data_countries.query("Year == @selected_year")
        
        avg_internet_percent = (internet_data_mask
                                .groupby(["Country Name"], as_index=False)
                                .agg({"% of Internet Usage" : "mean"})
                                  )
        fig = px.choropleth(
             data_frame= avg_internet_percent,
             locations= "Country Name",
             color="% of Internet Usage",
             locationmode= "country names",
             scope= "world",
             color_continuous_scale=px.colors.plotlyjs.Rainbow_r,
            title= f"Individuals using the Internet (% of population) in {selected_year} by country"
         )
        
        avg_internet_region =  (internet_data_mask
                                .groupby("Region", as_index=False)
                                .agg({"% of Internet Usage": "mean"})
                               )
        avg_internet_incomegroup =  (internet_data_mask
                                     .groupby("IncomeGroup", as_index=False)
                                     .agg({"% of Internet Usage": "mean"})
                                    )
        
        bar_region = px.bar(avg_internet_region,
                            x= "Region",
                            y= "% of Internet Usage",
                            title= f"Average Percentage of Internet Usage by region in {selected_year}"
                           )
        
         
        bar_incomegroup = px.bar(avg_internet_incomegroup,
                            x= "IncomeGroup",
                            y= "% of Internet Usage",
                            title= f"Average Percentage of Internet Usage by IncomeGroup in {selected_year} "
                           )

        
    return fig, bar_region, bar_incomegroup
            

if __name__ == "__main__":
    app.run_server(debug=True, port=2021)

4 Likes

Used the code that use provided and took the average of all the years for each country, I wanted to try and make it more interactive by adding a range slide to show the average per year but was unable to figure it out.

import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-48/API_IT.NET.USER.ZS_DS2_en_csv_v2_2160.csv")
df_filtered = df[df["Country Name"].isin(["Mexico", "Canada", "United States"])]

melted_data = pd.melt(
    df_filtered,
    id_vars=['Country Name'],
    var_name='Year',
    value_name='Quantity'
)
melted_data['Year'] = pd.to_numeric(melted_data['Year'], errors='coerce')
melted_data = melted_data.dropna(subset=['Quantity'])
melted_data['Quantity'] = pd.to_numeric(melted_data['Quantity'], errors='coerce')
melted_data = melted_data.groupby('Country Name')['Quantity'].mean().reset_index()

px.pie(melted_data, names='Country Name', values='Quantity', hole=0.3, template='ggplot2', title="Population percentage  of North American Internet Users 1960 - 2023")

3 Likes

That’s a cool approach, @Mike_Purtell . I’m assuming internet usages in these countries was extremely low for it to be able to jump by 25%. If 10 percent of the population was using the internet, a 25% growth means 12.5% would be using the internet after one year, which is not far fetched. But for countries that had already adopted the internet to an extended degree, I think it would be hard to see a 25% growth.

In other words, when I saw your graph I interpreted it as: From those countries with very low internet adoption rates, which ones grew the fastest. Do you think that’s the right way to see this?

hi @ZionH
we can try to talk about the rangeSlider at today’s session if you can make it. It’s at noon Eastern Time.

@Tiga what a beautiful app, congratulations. I love the color scheme you chose.

Did anything surprise you when looking at this data, after you built your dashboard?

hi @lumars , The y-axis ticklabels in the bottom graph where % suffix is correctly shown comes from the fig2.update_layout block where you set tickvals, ticktext, and tickformat=‘%’. The top graph where the y-axis labels do not have the % suffix does not have a similar fig.update_layout block. Can you please let us know if this fixes the issue. Also, will you be joining the call about 45 minutes from now? It would be great to have you and see if we can work this you. Thank you.

1 Like

Hi @adamschroeder , I did indeed try using RandomForestRegressor and it gives better results, although there are still some outliers in countries with limited data, and the scenario is not promising for certain countries. I have read that technology adoption in general and internet traffic follows the power law, but not specifically about internet adoption. If you have a link to the article in question, it would be interesting.