Figure Friday 2024 - week 34

Update : Figure Friday 2024 - week 35 is the newer dataset.

Music sets the tone for our mornings, days, and nights, whether we’re washing dishes or partying. The following data set explores various parameters of songs in the Spotify library, including genre, danceability, and tempo.

Things to consider:

  • how can the facet plot be improved?
  • how can the trendline be improved?
  • is there a better way to show the correlations between energy and danceability of songs?
  • are there other interesting correlations to discover?
  • what other graphs can we use to analyze the data set?

Sample Figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-34/dataset.csv')
fig = px.scatter(df, x="energy", y="danceability", facet_col="explicit", trendline='ols',
                 labels={"explicit": "Has explicit lyrics"})
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to Abhi for suggesting this data set and thank you to MaharshiPandya on Kaggle for the data.


More about the Data Set:

Column Description (by MaharshiPandya)

  • track_id: The Spotify ID for the track
  • artists: The artists’ names who performed the track. If there is more than one artist, they are separated by a ;
  • album_name: The album name in which the track appears
  • track_name: Name of the track
  • popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
  • duration_ms: The track length in milliseconds
  • explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)
  • danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable
  • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale
  • key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1
  • loudness: The overall loudness of a track in decibels (dB)
  • mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0
  • speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks
  • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
  • instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content
  • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live
  • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
  • tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
  • time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.
  • track_genre: The genre in which the track belongs
3 Likes

Some things of note for the dataset above: It’s a snapshot of Spotify from October 2022. There are also several songs listed under more than one genre (some data cleaning may be required to remove duplicates if it’s relevant to your analysis).

3 Likes

Here is a look at the top 10 tracks of the top 10 genres.

The top 10 genres have the highest mean popularity calculated from all tracks

The top 10 songs of each genre are the highest 10 popularity ranks within the genre.

Each bar chart shows to top tracks in each of the top 10 genres. Only 2 of them are pasted here, you can see all 10 by running the code.

import polars as pl
import plotly.express as px

#------------------------------------------------------------------------------#
#     Load data set from git or from local drive                               #
#------------------------------------------------------------------------------#
if False: # to read data from git, and save to disk
    df = pl.read_csv(
        'https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-34/dataset.csv'
    )
    df.write_csv('dataset.csv')
if True: # to read data fromdisk
    df = pl.read_csv(
        'dataset.csv'
    )

#------------------------------------------------------------------------------#
#     Find the top 10 of each genre - ranking based on popularity              #
#------------------------------------------------------------------------------#
top_10_by_genre = (
    pl.LazyFrame(
        df
        .select(pl.col('track_genre', 'artists', 'track_name', 'popularity'))
        .with_columns(ARTIST_COUNT = pl.col('artists').count().over('artists'))
        .unique(['track_genre', 'artists'])
        .group_by(['track_genre', 'artists', 'track_name', 'ARTIST_COUNT']).agg(pl.mean('popularity'))
        .sort(['track_genre','popularity'], descending=True)
        .with_columns(GENRE_COUNT = pl.col('track_genre').count().over('track_genre'))
        .with_columns(TRACK_GENRE_RANK = pl.col('track_genre').cum_count().over('track_genre'))
        .filter(pl.col('TRACK_GENRE_RANK') <= 10)
    )
    .collect()
)

#------------------------------------------------------------------------------#
#     find top 10 genres based on popularity of its songs                      #
#------------------------------------------------------------------------------#
top_10_genres = (
    top_10_by_genre
    .group_by(pl.col('track_genre')).agg(pl.mean('popularity'))
    .sort('popularity', descending=True)
    .head(10)
    .select(pl.col('track_genre'))
    .to_series()
    .to_list()
)

#------------------------------------------------------------------------------#
#      Make bar charts of top 10 songs in the top 10 genres                    #
#------------------------------------------------------------------------------#
for genre in top_10_genres:  # iterate through the top 10 genres
    df_plot = (
        pl.LazyFrame(
            top_10_by_genre
            .filter(pl.col('track_genre') == genre)
            .select(pl.col('artists', 'track_name', 'popularity','track_genre', 'ARTIST_COUNT'))
            .with_columns(
                pl.col('artists').str.to_titlecase(),
            )
            .with_columns(
                ARTIST_TOTAL = pl.col('artists').count().over('artists')
            )
            .with_row_index(offset=1)
            .tail(10)
            .with_columns(
                ARTIST_TRACK =  (
                    pl.lit('<b>') +            #  bold font for artist name
                    pl.col('artists') + 
                    pl.lit('</b>') +           #  end bold font, use normal fon for track name
                    pl.lit('     ') +          #  add spaces after artist name to separate from plot
                    pl.lit('<br>') +           #  html line feed puts artist name on first line,track name on second
                    pl.col('track_name')  +  
                    pl.lit('     ')            #  add spaces after track name to separate from plot
                    )
            )
            .sort('popularity')
        )
        .collect()
    )

    #  Make horizontal bar chart
    fig = px.bar(
        df_plot.sort('popularity', descending=False), 
        x='popularity',
        y="ARTIST_TRACK",
        # color = 'ARTIST_COUNT',
        orientation = 'h',
        template='plotly_white',
        height=600,
        width=1000,
        range_x=[80, 100],
        # category_orders={'index': df.index[::-1]}
        )
    fig.update_layout(title = genre)
    # fig.update_layout( yaxis={'categoryorder':'array', 'categoryarray':df.index})
    fig.show()

4 Likes

Hi Plotly Community,

I built a Dash app dashboard for compare the overall tracks’ metrics and categories distribution against popular and not popular songs based on popularity bins.


Features:

  • Histogram Chart with Metric Selector: Switch between different metrics using tabs to update the histogram and view the average value for each popularity bin.
  • Overall Average Card: A quick view of the overall average for the selected metric.
  • Most and Least Popular Comparison Cards: Compare the most and least popular tracks against the overall average.
  • Category & Popularity Bin Selectors: Group data by specific categories (genre, explicit content, time signature) and filter by popularity bin.
  • Butterfly Chart: Compares the overall distribution of tracks by selected category with the selected popularity bin.
  • Tracks Table: Display detailed information for tracks within the selected popularity bin.

As mexican I found interesing that there are strong presence of urban and latino genres in most popular bin compared with the overall.

Dash App

6 Likes

Thanks for sharing the code for the bar charts, @Mike_Purtell :slight_smile:
For the code to run successfully, we just have to delete the s on line 39 in top_10_by_genres

1 Like

Congratulations, @Alfredo49 . That’s a beautiful app. I like that you chose the same spotify green :smiley:

By the way. The source code link at the very bottom of the app doesn’t work.

One thing I would recommend is to add information tied to each tab. People visiting your app that are not familiar with the data set would not know what speechiness or mode or instrumentalness would mean.

Maybe you can add a DMC information icon with DashIconify next to the title:

from dash import Dash, dcc, html, Input, Output, dash_table, callback
import dash_mantine_components as dmc
from dash_iconify import DashIconify

app = Dash(__name__)

app.layout = dmc.Container(
    [
        dmc.Group([
            dmc.Title(
                "Average Instrumentalness by Popularity"),
            dmc.Tooltip(
                multiline=True,
                w=220,
                withArrow=True,
                label="Predicts whether a track contains no vocals. 'Ooh' and 'aah' sounds are treated as instrumental in this context. "
                      "Rap or spoken word tracks are clearly 'vocal'. The closer the instrumentalness value is to 1.0, "
                      "the greater likelihood the track contains no vocal content.",
                children=[DashIconify(icon="feather:info", width=30,)]
            ),
        ])
    ],
    fluid=True,
)



if __name__ == "__main__":
    app.run_server(debug=True)
3 Likes

Thanks for the feedback @adamschroeder :smiley:

The source code link should work now.
Regarding the information labels with dash_iconify, I definitively will implement it!

2 Likes

Thank you @adamschroeder for catching that, I have updated the posted code.

HI @Alfredo49 , your app for the Spotify data is absolutely gorgeous, great job.

1 Like

Awesome visualizations everyone! I made a density heatmap comparing loudness (in decibels) and tempo (in beats per minute) for songs in the pop genre.

*Note: I did some research and apparently when measuring decibels digitally, the highest level is 0 dB. The decibel measurements are then scaled according to that. That’s why there are negative decibel values.

4 Likes

beautiful app, @aswagh3 . And thank you for showing it at the Figure Friday session earlier today.
Most songs seem to fall in the -5 dB and 120 beats per minute. I wonder what this would look like when filtering for the top bracket of popular songs (75-100 points), like @Alfredo49 did in his app.
I wonder if the most popular songs also follow this trend of -5 dB and 120 beats per minute.

2 Likes

Hello Guys!
Really late, but I’ve deployed my app for week 34 :tada:
https://sdidier-dev.freeboxos.fr/sdidier-dev/figure-friday/W34

Notes:

  • You can also take a look on the apps of the previous weeks :grin:
    (few are missing, including the home page, but hopefully they will come soon!)
  • The apps are responsive and compatible with all available Bootstrap themes and light/dark mode
  • Sorry if it is long to load, it is hosted on a Raspberry Pi in France :innocent:

Edit: I forgot the link to the code:

5 Likes