Figure Friday 2025 - week 2

Reminder: there will NOT be a Figure Friday session this Friday.

How much plastic is in our products?

This week we’ll look at Data on Plastic Chemicals in Bay Area Foods.

To dive into each product with more detail, please go to PlasticList and click on the product of interest under the product column.

Things to consider:

  • can you improve the sample figure below (scatter matrix)?
  • would you like to tell a different data story using a different graph?
  • can you create a Dash app instead?

Sample figure:

Code for sample figure:
import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-2/samples.tsv",  sep='\t')
# convert strings with < and > symbols to floats
df['DEHP_ng_serving'] = df['DEHP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DBP_ng_serving'] = df['DBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DEP_ng_serving'] = df['DEP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['BBP_ng_serving'] = df['BBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)

# remove a few outliers
dff = df[(df['DBP_ng_serving'] < 8000) & (df['DEHP_ng_serving'] < 20000)]
fig = px.scatter_matrix(dff, dimensions=["DEHP_ng_serving", "DBP_ng_serving", "DEP_ng_serving", "BBP_ng_serving"])
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to PlasticList for the data. Please cite the data as:

PlasticList. 'Data on Plastic Chemicals in Bay Area Foods'. plasticlist.org. Accessed Jan 10, 2025.
1 Like

After reading a lot on the website plasticlist.org I thought “this is be careful data”, meaning if the people at plasticlist.org were very careful to draw any conclusion I’m certainly not equipped to do it.

If I would choose a product group to dive in more it would be “cheeseburgers” or “breastmilk”.

I created a dashboard to navigate through the data in a different way. Grouped by “blinded name”,
products in a list. Not much explanation on the dashboard to make certain everybody will visit plastlist.org.
Q1 t/m !4 means scores in percentile Q1 to Q4.

Link and code on py.cafe: PyCafe - Dash - A way to browse the data of plasticlist.org, no conclusions.

It looks like this and is not responsive:

5 Likes

I spent most of my time familiarizing myself with the data. For me it was one of the more interesting and relevant datasets we worked with. I find it interesting we have “acceptable/ tolerable levels” of any of these chemicals to be honest. What is the “tolerable” amount of times one would want to be punched in the face daily?

I filtered out the LOQ as results being below the measurable threshold didn’t interest me and created too much noise in the data.

On a side note, seeing the Whole Foods category consistently recorded higher levels of these Chemicals was an eye opener. Charging a premium for marginal better quality only to see these harmful chemicals showing higher levels is quite ironic.

import plotly.express as px
import pandas as pd
from dash import Dash, dcc, html

# Load data
data_path = r"Data_path"
data = pd.read_csv(data_path)

# Clean the BPA column: Remove unwanted characters and replace invalid values
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].replace(r'^\s*$', None, regex=True)
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].astype(str).str.replace('}', '', regex=False)
data['BPA_percent_tdi_14_kg_epa'] = pd.to_numeric(data['BPA_percent_tdi_14_kg_epa'], errors='coerce')

# Drop rows with NaN values
data.dropna(subset=['BPA_percent_tdi_14_kg_epa'], inplace=True)

# Normalize product names
data['product'] = data['product'].str.strip().str.lower()

# Filter out extremely small values (e.g., below 0.01)
data = data[data['BPA_percent_tdi_14_kg_epa'] >= 0.01]

# Take the highest value per product
grouped_data = data.groupby('product', as_index=False).max()

# Convert the highest values to percentages
grouped_data['BPA_percent_tdi_14_kg_epa'] *= 100  # Convert to percentages

# Round percentages to 1 decimal place
grouped_data['BPA_percent_tdi_14_kg_epa'] = grouped_data['BPA_percent_tdi_14_kg_epa'].round(1)

# Sort by BPA percentage
grouped_data = grouped_data.sort_values(by='BPA_percent_tdi_14_kg_epa', ascending=False)

# Create the bar chart
fig = px.bar(
    grouped_data,
    y='BPA_percent_tdi_14_kg_epa',
    x='product',
    title="BPA as Percentage of TDI by Product (Max Value)"
)

# Explicitly set the text template to show values correctly formatted
fig.update_traces(
    texttemplate='%{y:.1f}%',  # Format the y-values as percentages with 1 decimal
    textposition="outside",
    textfont_size=12,
    cliponaxis=False
)

# Adjust layout for better spacing
fig.update_layout(
    yaxis=dict(range=[0, 100], showgrid=True),  # Adjust range to focus on visible bars
    xaxis=dict(tickangle=-45),  # Rotate x-axis labels for readability
    width=1200  # Increase width to improve bar spacing
)

# Dash app setup
app = Dash(__name__)
 

app.layout = html.Div([
    dcc.Graph(figure=fig)
])

if __name__ == '__main__':
    app.run_server(debug=True)
7 Likes

Hi @marieanne , Your app was very good. I like the product list: dynamic call back and how you show all chemicals by product selection. What did you mean regarding the blinded name?

Reviewing their posted dataset with formatting you’ll notice they had blanks in some products spaces but this was where they had ran multiple tests. Attempting to inspect the data in .tsv didn’t offer any help so I converted to.csv and it had a better format for me to inspect the data.

@marieanne , I see the blinded_name column now. Early in my tsv to csv file conversions I deleted/didn’t use this so it threw me off, sorry for the confusion on my prior post haha.

Great point, @ThomasD21M , and an informative bar chart. Thanks for sharing.

1 Like

Hi @ThomasD21M, I think my app is boring. The blinded_name ( = group, for example cheeseburger) groups all cheeseburger products. I first did not include the products, later when I added the disclaimer card, I added the product names too as in “why not”., it’s included in the data.

I do figurefriday in the weekend on “non-working days”. And my app is more to get familiar with the data. What makes this dataset more interesting is also what makes this dataset more scary for me.
I make a thinking error, something gets out in the open and before you know it… along those lines. :slight_smile:

And I forgot, you’re viz is very informative.

1 Like

My expertise with week 2’s dataset is very low to put it mildly. My submission only offers a few minor improvements to the sample code offered for this exercise.

Improvements include:
• diagonal_visible=False: Plots on the diagonal line have the same data for x and y axis, and show perfect correlation and linearity. They are useful as sanity check but do not add value to the visualization. diagonal_visible=False removes them.
• showupperhalf=False: The upper right plots mirror the lower left plots, with x-axis and y-axis swapped. They are useful as sanity checks but do not add value to the visualization. showupperhalf _visible=False removes them.
• Legend – after removing the diagonal and the upperhalf plots, the legend was far away from the scatterplots. I used update legend to move it closer to the plots.
• Used the ‘shipped_in’ column to set the marker colors. I don’t see any useful pattern here, but leave it in as an example.
• Used the datasource attribution as the suptitle.

Here is a screenshot:

Here is the code.

import polars as pl
import plotly.express as px

#-------------------------------------------------------------------------------
#    Read and clean the data
#-------------------------------------------------------------------------------
scatter_matrix_cols = [
    'DEHP_ng_serving',  'DBP_ng_serving','DEP_ng_serving', 'BBP_ng_serving']
df_scatter_matrix = (
    pl.scan_csv('samples.tsv', separator='\t')
    .select(pl.col(['shipped_in'] + scatter_matrix_cols))
    .with_columns(pl.col(scatter_matrix_cols).str.replace(r'<',''))
    .with_columns(pl.col(scatter_matrix_cols).str.replace(r'>',''))
    .with_columns(pl.col(scatter_matrix_cols).cast(pl.Float64))
    .filter(pl.col('DBP_ng_serving') < 8000)
    .filter(pl.col('DEHP_ng_serving') < 5000)  # adam filtered on 20K
    .collect()
)

#-------------------------------------------------------------------------------
#    Generate the scatter_matrix
#-------------------------------------------------------------------------------
fig = px.scatter_matrix(
    df_scatter_matrix,
    dimensions=scatter_matrix_cols,
    template='simple_white',
    height=1000, width=1200,
    title=(
        'Bay Area Foods with Plastic - px.scatter_matrix' + 
        "<br><sup>Data Source: PlasticList. 'Data on Plastic Chemicals in Bay Area Foods'." +  
        '  plasticlist.org. Accessed Jan 10, 2025.</sup>'
    ),
    color='shipped_in'
)
fig.update_layout(title=dict(font=dict(size=22)))

# remove plots from the grid diagonal, and from the upper half
fig.update_traces(
    diagonal_visible=False,
    showupperhalf=False,
    marker=dict(size=5),
)

# move the legend closer to the data
fig.update_layout(
        legend=dict(
        yanchor='middle',
        y=0.9,
        xanchor="left",
        x=0.4,
        font=dict(size=16)
    )
)
fig.write_html('scatter_matrix.html')
fig.show()

7 Likes

Great improvements to the charts, @Mike_Purtell , especially the diagonal_visible=False.

Is it only me or do the red markers (food samples shipped in glass vials) show a stronger correlation than the other shipping methods?

I tried to show the differences in colors and also delved into the topic of bootstrap. :rocket::technologist:

4 Likes