Figure Friday 2024 - week 40

The Eurovision Song Contest is an international song competition with a great majority of the contestants representing countries within Europe. This week we will dive into data on 1735 songs that have competed since 1956.

The above data set comes from the eurovision-dataset GitHub repo. However if you’d like to explore other data sets around this famous contest (such as votes per country and per juror, breakdown of regions, or bookmakers’ odds for winners), please visit the mirovision GitHub repo.

Sample figure and app:

Code for sample figure:
import plotly.express as px
import pandas as pd
from dash import Dash, dcc, Input, Output, callback

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-40/contestants.csv")


app = Dash()
app.layout = [
    dcc.Dropdown(options=sorted(df.to_country.unique()),
                 value="Netherlands",
                 clearable=False,
                 id="country"),
    dcc.Graph(id="strip-graph")
]

@callback(
    Output("strip-graph", "figure"),
    Input("country","value")
)
def update_graph(country_slctd):
    dff = df.copy()
    dff['color'] = dff['to_country'].apply(lambda x: country_slctd if x == country_slctd else 'other')
    fig = px.strip(dff, x="points_final", y="year", hover_data="to_country", color="color", height=750,
                   color_discrete_map={country_slctd: 'red', 'other': 'blue'})
    return fig

if __name__ == '__main__':
    app.run(debug=True)

Things to consider:

  • can you improve the sample figure built?
  • would a different figure tell the data story better?
  • can you improve the sample Dash app?

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to Eurovision-dataset and mirovision for the data.

2 Likes

Update: Oct 13
The code produces a heat map with raw data showing total number of votes from any country to any other country. A second heat map is added to show normalize the vote totals. Normalized vote totals are the raw totals divided by the number participation years by the vote-giving country. This bias places higher value on votes from countries with fewer participation years.

Here are the screenshots of the raw data heat map and normalized heat map:

Here is the code:

import polars as pl
import plotly.express as px

def make_histogram(df, my_title='No Title Provided'):
    ''' quick histogram for debug'''
    fig = px.histogram(
        df,
        df.columns[1:],
        template='plotly_white',
        height=400, 
        width=600,
    )
    fig.update_layout(showlegend=False,title=my_title)
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(showgrid=False)
    fig.show()
    return

def make_heatmap(
        df, 
        my_max=10000, 
        my_title='No Title Provided', 
        x_title = 'No X title provided',
        y_title = 'No Y title provided',
        hover_entity='No Hover Entity Provided'
        ):
    '''  make the heat map, various data'''
    Y = list(df['from_country'])
    X = list(df.columns)
    fig = px.imshow(
        df,
        x=X,
        y=Y,
        text_auto=True, 
        height=1200, 
        width=1200,
        range_color=(0,my_max),
        labels=dict(x='To ', y='From', color=hover_entity),
    )
    fig.update_layout(
        template='plotly_white',
        title=my_title.upper(), 
        title_font = {"size": 28},
    )

    fig.update_xaxes(title_text = x_title, title_font = {"size": 20})
    fig.update_yaxes(title_text = y_title, title_font = {"size": 20})
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(showgrid=False)
    fig.show()
    return

#------------------------------------------------------------------------------#
#     Load data, each as a Lazy Frame and Data Frame                           #
#------------------------------------------------------------------------------#
df_countries_lazy = pl.scan_csv('./countries.csv')
df_countries = df_countries_lazy.collect()
df_votes_lazy = pl.scan_csv('./votes.csv')
df_votes = df_votes_lazy.collect()

#------------------------------------------------------------------------------#
#     make dataframe for heat maps, without data normalizaitons                #
#------------------------------------------------------------------------------#
df_heat_map = (
    df_votes_lazy
    .select(pl.col('year', 'from_country', 'to_country', 'total_points'))
    .join(
        df_countries_lazy.rename({'country': 'from_country'}),  
        how='left',
        on='from_country'
        )
    .drop('from_country')
    .rename({'country_name':'from_country'})
    .join(
        df_countries_lazy.rename({'country': 'to_country'}),  
        how='left',
        on='to_country'
        )
    .drop('to_country')
    .rename({'country_name':'to_country'})
    .group_by('from_country', 'to_country')
    .agg(pl.col('total_points').sum())
    .with_columns(  # shorten full names of these countries, to uncrowd the axis labels
        pl.col('to_country', 'from_country')
        .str.replace('Serbia and Montenegro', 'Serb & Mont')
        .str.replace('Bosnia & Herzegovina', 'Bos & Herz')
        .str.replace('North Macedonia', 'N. Maced')
        .str.replace('United Kingdom', 'U.K.')
        )
    .collect()   # Lazyframes can't pivot. Collect here to convert to data frame
    .pivot(
        on='to_country',
        index='from_country'
    )
    .sort('from_country')
)

# sort columns alphabetically, with 'from_country on the far left
df_columns = sorted(df_heat_map.columns)
df_heat_map = (
    df_heat_map.select(
        ['from_country'] + 
        [c for c in df_columns if c != 'from_country']
        )
)

#------------------------------------------------------------------------------#
#     Make a histogram of raw data to guide color_range selection              #
#------------------------------------------------------------------------------#

# From this histogram, 300 is a reasonable value for filtering outliers
make_histogram(df_heat_map, my_title='Raw Data')

make_heatmap(
    df_heat_map, 
    my_max=300, 
    my_title=('Eurovision Votes since 1956'.upper()),  
    x_title = 'VOTES TO COUNTRY',
    y_title = 'VOTES FROM COUNTRY',
    hover_entity='Votes'
)

#------------------------------------------------------------------------------#
#     Normalize Dataframe by dividing voteds recieved from any country by      #
#     the giving countrys years or participation                               #
#------------------------------------------------------------------------------#

df_country_participation_years = (
    df_votes
    .select(pl.col('year','from_country'))
    .unique(pl.col('year', 'from_country'))
    .with_columns(COUNTRY_YEAR_COUNT=pl.col('year').count().over('from_country'))
    .join(
        df_countries.rename({'country': 'from_country'}),  
        how='left',
        on='from_country'
    )
    .select(pl.col('country_name','COUNTRY_YEAR_COUNT'))
    .rename({'country_name':'from_country'})
    .with_columns(  # shorten full names of these countries, to uncrowd the axis labels
        pl.col('from_country')
        .str.replace('Serbia and Montenegro', 'Serb & Mont')
        .str.replace('Bosnia & Herzegovina', 'Bos & Herz')
        .str.replace('North Macedonia', 'N. Maced')
        .str.replace('United Kingdom', 'U.K.')
        )
    .unique('from_country')
    .sort('from_country', descending=False)
)

#------------------------------------------------------------------------------#
#     Join years per country with the heatmap dataframe                        #
#------------------------------------------------------------------------------#
df_heat_map = (
    df_heat_map
    .join(
        df_country_participation_years,
        how='left',
        on='from_country'
    )
)

# alphabetic col sort, with 'from_country', 'COUNTRY_YEAR_COUNT' on the left
df_heat_map_columns = sorted(df_heat_map.columns)
left_cols = ['from_country', 'COUNTRY_YEAR_COUNT']
df_normalized_heat_map = (
    df_heat_map
    .select(left_cols + [c for c in df_heat_map_columns if c not in left_cols])
)

data_cols = df_normalized_heat_map.columns[2:]

for country in data_cols:
    participation_years = (
        df_country_participation_years
        .filter(pl.col('from_country') == country)
        .select(pl.col('COUNTRY_YEAR_COUNT'))
        .to_series().to_list()
    )[0]

    for col in data_cols:
        df_normalized_heat_map = (
            df_normalized_heat_map
            .with_columns(
                pl.when(pl.col('from_country') ==  country)
                .then(100*pl.col(col)/participation_years)
                .otherwise(col)
                .cast(pl.UInt16)
                .alias(col)
            )
        )

df_normalized_heat_map = df_normalized_heat_map.drop('COUNTRY_YEAR_COUNT')

#------------------------------------------------------------------------------#
#     Make a histogram of normalized data to guide color_range selection       #
#------------------------------------------------------------------------------#
make_histogram(df_normalized_heat_map, my_title='Normalized Data')

make_heatmap(
    df_normalized_heat_map, 
    my_max=1000, 
    my_title=('Normalized Eurovision Votes since 1956'.upper()),  
    x_title = 'VOTES TO COUNTRY',
    y_title = 'VOTES FROM COUNTRY',
    hover_entity='Normalized Votes'
)

1 Like

I really enjoyed looking at the heatmap, @Mike_Purtell .
It looks like the Sweden and Norway tend to get a substantial amount of votes from multiple countries. The way I interpreted this is that when their songs are liked by other countries, they tend to be really liked. Is that what you got from this?

Does the number of years a country has participated in the competition impact this heatmap, given that you totaled votes for all years? For example, I think Bos & Herz has joined the competition in the 90’s, while Sweden joined a few decades before that. This means that Bos & Herz had less time to accumulate votes from other countries.

Perhaps one way to remove that time bias and standardize the data is to divide the total number of votes per country by the amount of years they have been in competition, so you’re showing annual average votes per country.

A tip for heat maps, or any other visualization tied to a color scale, is to check the distribution of your data, and set range_color to fit to the bulk of the distribution. Don’t waste the color range on outliers.

Here is the heatmap I submitted earlier for Figure Friday Week 40. Data ranges from 0 to about 500. The lack of visible yellows shows that the big numbers are sparse.

With a few lines of code for a px.histogram, it is easy to see that values above 300 are very few and far between.

Values above 300 take up over 40% of the color range. Easy fix is to cap the range_color at 300. Here is the update, looks much better,with a wider distribution of colors.

I am working on 1 more visualization to wrap up my work on Week 40, and when that is done will post all of the code

1 Like