Figure Friday 2024 - week 34

adamschroeder · August 23, 2024, 2:05pm

Update : Figure Friday 2024 - week 35 is the newer dataset.

Music sets the tone for our mornings, days, and nights, whether we’re washing dishes or partying. The following data set explores various parameters of songs in the Spotify library, including genre, danceability, and tempo.

Things to consider:

how can the facet plot be improved?
how can the trendline be improved?
is there a better way to show the correlations between energy and danceability of songs?
are there other interesting correlations to discover?
what other graphs can we use to analyze the data set?

Sample Figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-34/dataset.csv')
fig = px.scatter(df, x="energy", y="danceability", facet_col="explicit", trendline='ols',
                 labels={"explicit": "Has explicit lyrics"})
fig.show()

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to Abhi for suggesting this data set and thank you to MaharshiPandya on Kaggle for the data.

More about the Data Set:

Column Description (by MaharshiPandya)

track_id: The Spotify ID for the track
artists: The artists’ names who performed the track. If there is more than one artist, they are separated by a ;
album_name: The album name in which the track appears
track_name: Name of the track
popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
duration_ms: The track length in milliseconds
explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)
danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale
key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1
loudness: The overall loudness of a track in decibels (dB)
mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0
speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks
acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.
track_genre: The genre in which the track belongs

aswagh3 · August 23, 2024, 4:47pm

Some things of note for the dataset above: It’s a snapshot of Spotify from October 2022. There are also several songs listed under more than one genre (some data cleaning may be required to remove duplicates if it’s relevant to your analysis).

Mike_Purtell · August 27, 2024, 4:46pm

Here is a look at the top 10 tracks of the top 10 genres.

The top 10 genres have the highest mean popularity calculated from all tracks

The top 10 songs of each genre are the highest 10 popularity ranks within the genre.

Each bar chart shows to top tracks in each of the top 10 genres. Only 2 of them are pasted here, you can see all 10 by running the code.

import polars as pl
import plotly.express as px

#------------------------------------------------------------------------------#
#     Load data set from git or from local drive                               #
#------------------------------------------------------------------------------#
if False: # to read data from git, and save to disk
    df = pl.read_csv(
        'https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-34/dataset.csv'
    )
    df.write_csv('dataset.csv')
if True: # to read data fromdisk
    df = pl.read_csv(
        'dataset.csv'
    )

#------------------------------------------------------------------------------#
#     Find the top 10 of each genre - ranking based on popularity              #
#------------------------------------------------------------------------------#
top_10_by_genre = (
    pl.LazyFrame(
        df
        .select(pl.col('track_genre', 'artists', 'track_name', 'popularity'))
        .with_columns(ARTIST_COUNT = pl.col('artists').count().over('artists'))
        .unique(['track_genre', 'artists'])
        .group_by(['track_genre', 'artists', 'track_name', 'ARTIST_COUNT']).agg(pl.mean('popularity'))
        .sort(['track_genre','popularity'], descending=True)
        .with_columns(GENRE_COUNT = pl.col('track_genre').count().over('track_genre'))
        .with_columns(TRACK_GENRE_RANK = pl.col('track_genre').cum_count().over('track_genre'))
        .filter(pl.col('TRACK_GENRE_RANK') <= 10)
    )
    .collect()
)

#------------------------------------------------------------------------------#
#     find top 10 genres based on popularity of its songs                      #
#------------------------------------------------------------------------------#
top_10_genres = (
    top_10_by_genre
    .group_by(pl.col('track_genre')).agg(pl.mean('popularity'))
    .sort('popularity', descending=True)
    .head(10)
    .select(pl.col('track_genre'))
    .to_series()
    .to_list()
)

#------------------------------------------------------------------------------#
#      Make bar charts of top 10 songs in the top 10 genres                    #
#------------------------------------------------------------------------------#
for genre in top_10_genres:  # iterate through the top 10 genres
    df_plot = (
        pl.LazyFrame(
            top_10_by_genre
            .filter(pl.col('track_genre') == genre)
            .select(pl.col('artists', 'track_name', 'popularity','track_genre', 'ARTIST_COUNT'))
            .with_columns(
                pl.col('artists').str.to_titlecase(),
            )
            .with_columns(
                ARTIST_TOTAL = pl.col('artists').count().over('artists')
            )
            .with_row_index(offset=1)
            .tail(10)
            .with_columns(
                ARTIST_TRACK =  (
                    pl.lit('<b>') +            #  bold font for artist name
                    pl.col('artists') + 
                    pl.lit('</b>') +           #  end bold font, use normal fon for track name
                    pl.lit('     ') +          #  add spaces after artist name to separate from plot
                    pl.lit('<br>') +           #  html line feed puts artist name on first line,track name on second
                    pl.col('track_name')  +  
                    pl.lit('     ')            #  add spaces after track name to separate from plot
                    )
            )
            .sort('popularity')
        )
        .collect()
    )

    #  Make horizontal bar chart
    fig = px.bar(
        df_plot.sort('popularity', descending=False), 
        x='popularity',
        y="ARTIST_TRACK",
        # color = 'ARTIST_COUNT',
        orientation = 'h',
        template='plotly_white',
        height=600,
        width=1000,
        range_x=[80, 100],
        # category_orders={'index': df.index[::-1]}
        )
    fig.update_layout(title = genre)
    # fig.update_layout( yaxis={'categoryorder':'array', 'categoryarray':df.index})
    fig.show()

Alfredo49 · August 27, 2024, 6:29pm

Hi Plotly Community,

I built a Dash app dashboard for compare the overall tracks’ metrics and categories distribution against popular and not popular songs based on popularity bins.

Features:

Histogram Chart with Metric Selector: Switch between different metrics using tabs to update the histogram and view the average value for each popularity bin.
Overall Average Card: A quick view of the overall average for the selected metric.
Most and Least Popular Comparison Cards: Compare the most and least popular tracks against the overall average.
Category & Popularity Bin Selectors: Group data by specific categories (genre, explicit content, time signature) and filter by popularity bin.
Butterfly Chart: Compares the overall distribution of tracks by selected category with the selected popularity bin.
Tracks Table: Display detailed information for tracks within the selected popularity bin.

As mexican I found interesing that there are strong presence of urban and latino genres in most popular bin compared with the overall.

Dash App

adamschroeder · August 27, 2024, 8:27pm

Thanks for sharing the code for the bar charts, @Mike_Purtell
For the code to run successfully, we just have to delete the s on line 39 in top_10_by_genres

adamschroeder · August 27, 2024, 8:44pm

Congratulations, @Alfredo49 . That’s a beautiful app. I like that you chose the same spotify green

By the way. The source code link at the very bottom of the app doesn’t work.

One thing I would recommend is to add information tied to each tab. People visiting your app that are not familiar with the data set would not know what speechiness or mode or instrumentalness would mean.

Maybe you can add a DMC information icon with DashIconify next to the title:

from dash import Dash, dcc, html, Input, Output, dash_table, callback
import dash_mantine_components as dmc
from dash_iconify import DashIconify

app = Dash(__name__)

app.layout = dmc.Container(
    [
        dmc.Group([
            dmc.Title(
                "Average Instrumentalness by Popularity"),
            dmc.Tooltip(
                multiline=True,
                w=220,
                withArrow=True,
                label="Predicts whether a track contains no vocals. 'Ooh' and 'aah' sounds are treated as instrumental in this context. "
                      "Rap or spoken word tracks are clearly 'vocal'. The closer the instrumentalness value is to 1.0, "
                      "the greater likelihood the track contains no vocal content.",
                children=[DashIconify(icon="feather:info", width=30,)]
            ),
        ])
    ],
    fluid=True,
)



if __name__ == "__main__":
    app.run_server(debug=True)

Alfredo49 · August 27, 2024, 9:25pm

Thanks for the feedback @adamschroeder

The source code link should work now.
Regarding the information labels with dash_iconify, I definitively will implement it!

Mike_Purtell · August 28, 2024, 1:52pm

Thank you @adamschroeder for catching that, I have updated the posted code.

Mike_Purtell · August 28, 2024, 2:25pm

HI @Alfredo49 , your app for the Spotify data is absolutely gorgeous, great job.

aswagh3 · August 30, 2024, 5:05pm

Awesome visualizations everyone! I made a density heatmap comparing loudness (in decibels) and tempo (in beats per minute) for songs in the pop genre.

*Note: I did some research and apparently when measuring decibels digitally, the highest level is 0 dB. The decibel measurements are then scaled according to that. That’s why there are negative decibel values.

adamschroeder · August 30, 2024, 5:27pm

beautiful app, @aswagh3 . And thank you for showing it at the Figure Friday session earlier today.
Most songs seem to fall in the -5 dB and 120 beats per minute. I wonder what this would look like when filtering for the top bracket of popular songs (75-100 points), like @Alfredo49 did in his app.
I wonder if the most popular songs also follow this trend of -5 dB and 120 beats per minute.

Skiks · September 8, 2024, 2:17pm

Hello Guys!
Really late, but I’ve deployed my app for week 34
https://sdidier-dev.freeboxos.fr/sdidier-dev/figure-friday/W34

Notes:

You can also take a look on the apps of the previous weeks
(few are missing, including the home page, but hopefully they will come soon!)

The apps are responsive and compatible with all available Bootstrap themes and light/dark mode

Sorry if it is long to load, it is hosted on a Raspberry Pi in France

Edit: I forgot the link to the code:

Topic		Replies	Views
Figure Friday 2024 - week 35 Dash Python figure-friday	19	277	September 10, 2024
Figure Friday 2024 - week 31 Dash Python figure-friday	19	322	August 10, 2024
Figure Friday 2024 - week 51 Dash Python announcements , figure-friday	1	46	December 21, 2024
Figure Friday 2024 - week 40 Dash Python figure-friday	3	115	October 12, 2024
Figure Friday 2024 - week 36 Dash Python figure-friday	20	349	October 29, 2024

Figure Friday 2024 - week 34

Participation Instructions:

More about the Data Set:

Column Description (by MaharshiPandya)

Features:

Related topics