Music sets the tone for our mornings, days, and nights, whether we’re washing dishes or partying. The following data set explores various parameters of songs in the Spotify library, including genre, danceability, and tempo.
Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.
artists: The artists’ names who performed the track. If there is more than one artist, they are separated by a ;
album_name: The album name in which the track appears
track_name: Name of the track
popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
duration_ms: The track length in milliseconds
explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)
danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale
key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1
loudness: The overall loudness of a track in decibels (dB)
mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0
speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks
acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.
Some things of note for the dataset above: It’s a snapshot of Spotify from October 2022. There are also several songs listed under more than one genre (some data cleaning may be required to remove duplicates if it’s relevant to your analysis).
import polars as pl
import plotly.express as px
#------------------------------------------------------------------------------#
# Load data set from git or from local drive #
#------------------------------------------------------------------------------#
if False: # to read data from git, and save to disk
df = pl.read_csv(
'https://raw.githubusercontent.com/plotly/Figure-Friday/main/2024/week-34/dataset.csv'
)
df.write_csv('dataset.csv')
if True: # to read data fromdisk
df = pl.read_csv(
'dataset.csv'
)
#------------------------------------------------------------------------------#
# Find the top 10 of each genre - ranking based on popularity #
#------------------------------------------------------------------------------#
top_10_by_genre = (
pl.LazyFrame(
df
.select(pl.col('track_genre', 'artists', 'track_name', 'popularity'))
.with_columns(ARTIST_COUNT = pl.col('artists').count().over('artists'))
.unique(['track_genre', 'artists'])
.group_by(['track_genre', 'artists', 'track_name', 'ARTIST_COUNT']).agg(pl.mean('popularity'))
.sort(['track_genre','popularity'], descending=True)
.with_columns(GENRE_COUNT = pl.col('track_genre').count().over('track_genre'))
.with_columns(TRACK_GENRE_RANK = pl.col('track_genre').cum_count().over('track_genre'))
.filter(pl.col('TRACK_GENRE_RANK') <= 10)
)
.collect()
)
#------------------------------------------------------------------------------#
# find top 10 genres based on popularity of its songs #
#------------------------------------------------------------------------------#
top_10_genres = (
top_10_by_genre
.group_by(pl.col('track_genre')).agg(pl.mean('popularity'))
.sort('popularity', descending=True)
.head(10)
.select(pl.col('track_genre'))
.to_series()
.to_list()
)
#------------------------------------------------------------------------------#
# Make bar charts of top 10 songs in the top 10 genres #
#------------------------------------------------------------------------------#
for genre in top_10_genres: # iterate through the top 10 genres
df_plot = (
pl.LazyFrame(
top_10_by_genre
.filter(pl.col('track_genre') == genre)
.select(pl.col('artists', 'track_name', 'popularity','track_genre', 'ARTIST_COUNT'))
.with_columns(
pl.col('artists').str.to_titlecase(),
)
.with_columns(
ARTIST_TOTAL = pl.col('artists').count().over('artists')
)
.with_row_index(offset=1)
.tail(10)
.with_columns(
ARTIST_TRACK = (
pl.lit('<b>') + # bold font for artist name
pl.col('artists') +
pl.lit('</b>') + # end bold font, use normal fon for track name
pl.lit(' ') + # add spaces after artist name to separate from plot
pl.lit('<br>') + # html line feed puts artist name on first line,track name on second
pl.col('track_name') +
pl.lit(' ') # add spaces after track name to separate from plot
)
)
.sort('popularity')
)
.collect()
)
# Make horizontal bar chart
fig = px.bar(
df_plot.sort('popularity', descending=False),
x='popularity',
y="ARTIST_TRACK",
# color = 'ARTIST_COUNT',
orientation = 'h',
template='plotly_white',
height=600,
width=1000,
range_x=[80, 100],
# category_orders={'index': df.index[::-1]}
)
fig.update_layout(title = genre)
# fig.update_layout( yaxis={'categoryorder':'array', 'categoryarray':df.index})
fig.show()
I built a Dash app dashboard for compare the overall tracks’ metrics and categories distribution against popular and not popular songs based on popularity bins.
Histogram Chart with Metric Selector: Switch between different metrics using tabs to update the histogram and view the average value for each popularity bin.
Overall Average Card: A quick view of the overall average for the selected metric.
Most and Least Popular Comparison Cards: Compare the most and least popular tracks against the overall average.
Category & Popularity Bin Selectors: Group data by specific categories (genre, explicit content, time signature) and filter by popularity bin.
Butterfly Chart: Compares the overall distribution of tracks by selected category with the selected popularity bin.
Tracks Table: Display detailed information for tracks within the selected popularity bin.
As mexican I found interesing that there are strong presence of urban and latino genres in most popular bin compared with the overall.
Thanks for sharing the code for the bar charts, @Mike_Purtell
For the code to run successfully, we just have to delete the s on line 39 in top_10_by_genres
Congratulations, @Alfredo49 . That’s a beautiful app. I like that you chose the same spotify green
By the way. The source code link at the very bottom of the app doesn’t work.
One thing I would recommend is to add information tied to each tab. People visiting your app that are not familiar with the data set would not know what speechiness or mode or instrumentalness would mean.
Maybe you can add a DMC information icon with DashIconify next to the title:
from dash import Dash, dcc, html, Input, Output, dash_table, callback
import dash_mantine_components as dmc
from dash_iconify import DashIconify
app = Dash(__name__)
app.layout = dmc.Container(
[
dmc.Group([
dmc.Title(
"Average Instrumentalness by Popularity"),
dmc.Tooltip(
multiline=True,
w=220,
withArrow=True,
label="Predicts whether a track contains no vocals. 'Ooh' and 'aah' sounds are treated as instrumental in this context. "
"Rap or spoken word tracks are clearly 'vocal'. The closer the instrumentalness value is to 1.0, "
"the greater likelihood the track contains no vocal content.",
children=[DashIconify(icon="feather:info", width=30,)]
),
])
],
fluid=True,
)
if __name__ == "__main__":
app.run_server(debug=True)
Awesome visualizations everyone! I made a density heatmap comparing loudness (in decibels) and tempo (in beats per minute) for songs in the pop genre.
*Note: I did some research and apparently when measuring decibels digitally, the highest level is 0 dB. The decibel measurements are then scaled according to that. That’s why there are negative decibel values.
beautiful app, @aswagh3 . And thank you for showing it at the Figure Friday session earlier today.
Most songs seem to fall in the -5 dB and 120 beats per minute. I wonder what this would look like when filtering for the top bracket of popular songs (75-100 points), like @Alfredo49 did in his app.
I wonder if the most popular songs also follow this trend of -5 dB and 120 beats per minute.