import plotly.express as px
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-2/samples.tsv", sep='\t')
# convert strings with < and > symbols to floats
df['DEHP_ng_serving'] = df['DEHP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DBP_ng_serving'] = df['DBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DEP_ng_serving'] = df['DEP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['BBP_ng_serving'] = df['BBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
# remove a few outliers
dff = df[(df['DBP_ng_serving'] < 8000) & (df['DEHP_ng_serving'] < 20000)]
fig = px.scatter_matrix(dff, dimensions=["DEHP_ng_serving", "DBP_ng_serving", "DEP_ng_serving", "BBP_ng_serving"])
fig.show()
Participation Instructions:
Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.
After reading a lot on the website plasticlist.org I thought “this is be careful data”, meaning if the people at plasticlist.org were very careful to draw any conclusion I’m certainly not equipped to do it.
If I would choose a product group to dive in more it would be “cheeseburgers” or “breastmilk”.
I created a dashboard to navigate through the data in a different way. Grouped by “blinded name”,
products in a list. Not much explanation on the dashboard to make certain everybody will visit plastlist.org.
Q1 t/m !4 means scores in percentile Q1 to Q4.
I spent most of my time familiarizing myself with the data. For me it was one of the more interesting and relevant datasets we worked with. I find it interesting we have “acceptable/ tolerable levels” of any of these chemicals to be honest. What is the “tolerable” amount of times one would want to be punched in the face daily?
I filtered out the LOQ as results being below the measurable threshold didn’t interest me and created too much noise in the data.
On a side note, seeing the Whole Foods category consistently recorded higher levels of these Chemicals was an eye opener. Charging a premium for marginal better quality only to see these harmful chemicals showing higher levels is quite ironic.
import plotly.express as px
import pandas as pd
from dash import Dash, dcc, html
# Load data
data_path = r"Data_path"
data = pd.read_csv(data_path)
# Clean the BPA column: Remove unwanted characters and replace invalid values
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].replace(r'^\s*$', None, regex=True)
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].astype(str).str.replace('}', '', regex=False)
data['BPA_percent_tdi_14_kg_epa'] = pd.to_numeric(data['BPA_percent_tdi_14_kg_epa'], errors='coerce')
# Drop rows with NaN values
data.dropna(subset=['BPA_percent_tdi_14_kg_epa'], inplace=True)
# Normalize product names
data['product'] = data['product'].str.strip().str.lower()
# Filter out extremely small values (e.g., below 0.01)
data = data[data['BPA_percent_tdi_14_kg_epa'] >= 0.01]
# Take the highest value per product
grouped_data = data.groupby('product', as_index=False).max()
# Convert the highest values to percentages
grouped_data['BPA_percent_tdi_14_kg_epa'] *= 100 # Convert to percentages
# Round percentages to 1 decimal place
grouped_data['BPA_percent_tdi_14_kg_epa'] = grouped_data['BPA_percent_tdi_14_kg_epa'].round(1)
# Sort by BPA percentage
grouped_data = grouped_data.sort_values(by='BPA_percent_tdi_14_kg_epa', ascending=False)
# Create the bar chart
fig = px.bar(
grouped_data,
y='BPA_percent_tdi_14_kg_epa',
x='product',
title="BPA as Percentage of TDI by Product (Max Value)"
)
# Explicitly set the text template to show values correctly formatted
fig.update_traces(
texttemplate='%{y:.1f}%', # Format the y-values as percentages with 1 decimal
textposition="outside",
textfont_size=12,
cliponaxis=False
)
# Adjust layout for better spacing
fig.update_layout(
yaxis=dict(range=[0, 100], showgrid=True), # Adjust range to focus on visible bars
xaxis=dict(tickangle=-45), # Rotate x-axis labels for readability
width=1200 # Increase width to improve bar spacing
)
# Dash app setup
app = Dash(__name__)
app.layout = html.Div([
dcc.Graph(figure=fig)
])
if __name__ == '__main__':
app.run_server(debug=True)
Hi @marieanne , Your app was very good. I like the product list: dynamic call back and how you show all chemicals by product selection. What did you mean regarding the blinded name?
Reviewing their posted dataset with formatting you’ll notice they had blanks in some products spaces but this was where they had ran multiple tests. Attempting to inspect the data in .tsv didn’t offer any help so I converted to.csv and it had a better format for me to inspect the data.
@marieanne , I see the blinded_name column now. Early in my tsv to csv file conversions I deleted/didn’t use this so it threw me off, sorry for the confusion on my prior post haha.
Hi @ThomasD21M, I think my app is boring. The blinded_name ( = group, for example cheeseburger) groups all cheeseburger products. I first did not include the products, later when I added the disclaimer card, I added the product names too as in “why not”., it’s included in the data.
I do figurefriday in the weekend on “non-working days”. And my app is more to get familiar with the data. What makes this dataset more interesting is also what makes this dataset more scary for me.
I make a thinking error, something gets out in the open and before you know it… along those lines.
My expertise with week 2’s dataset is very low to put it mildly. My submission only offers a few minor improvements to the sample code offered for this exercise.
Improvements include:
• diagonal_visible=False: Plots on the diagonal line have the same data for x and y axis, and show perfect correlation and linearity. They are useful as sanity check but do not add value to the visualization. diagonal_visible=False removes them.
• showupperhalf=False: The upper right plots mirror the lower left plots, with x-axis and y-axis swapped. They are useful as sanity checks but do not add value to the visualization. showupperhalf _visible=False removes them.
• Legend – after removing the diagonal and the upperhalf plots, the legend was far away from the scatterplots. I used update legend to move it closer to the plots.
• Used the ‘shipped_in’ column to set the marker colors. I don’t see any useful pattern here, but leave it in as an example.
• Used the datasource attribution as the suptitle.