Figure Friday 2025 - week 2

adamschroeder · January 9, 2025, 10:14pm

Reminder: there will NOT be a Figure Friday session this Friday.

How much plastic is in our products?

This week we’ll look at Data on Plastic Chemicals in Bay Area Foods.

To dive into each product with more detail, please go to PlasticList and click on the product of interest under the product column.

Things to consider:

can you improve the sample figure below (scatter matrix)?
would you like to tell a different data story using a different graph?
can you create a Dash app instead?

Sample figure:

Code for sample figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-2/samples.tsv",  sep='\t')
# convert strings with < and > symbols to floats
df['DEHP_ng_serving'] = df['DEHP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DBP_ng_serving'] = df['DBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['DEP_ng_serving'] = df['DEP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)
df['BBP_ng_serving'] = df['BBP_ng_serving'].str.replace('[<>]', '', regex=True).astype(float)

# remove a few outliers
dff = df[(df['DBP_ng_serving'] < 8000) & (df['DEHP_ng_serving'] < 20000)]
fig = px.scatter_matrix(dff, dimensions=["DEHP_ng_serving", "DBP_ng_serving", "DEP_ng_serving", "BBP_ng_serving"])
fig.show()

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to PlasticList for the data. Please cite the data as:

PlasticList. 'Data on Plastic Chemicals in Bay Area Foods'. plasticlist.org. Accessed Jan 10, 2025.

marieanne · January 12, 2025, 3:52pm

After reading a lot on the website plasticlist.org I thought “this is be careful data”, meaning if the people at plasticlist.org were very careful to draw any conclusion I’m certainly not equipped to do it.

If I would choose a product group to dive in more it would be “cheeseburgers” or “breastmilk”.

I created a dashboard to navigate through the data in a different way. Grouped by “blinded name”,
products in a list. Not much explanation on the dashboard to make certain everybody will visit plastlist.org.
Q1 t/m !4 means scores in percentile Q1 to Q4.

Link and code on py.cafe: PyCafe - Dash - A way to browse the data of plasticlist.org, no conclusions.

It looks like this and is not responsive:

ThomasD21M · January 13, 2025, 1:02pm

I spent most of my time familiarizing myself with the data. For me it was one of the more interesting and relevant datasets we worked with. I find it interesting we have “acceptable/ tolerable levels” of any of these chemicals to be honest. What is the “tolerable” amount of times one would want to be punched in the face daily?

I filtered out the LOQ as results being below the measurable threshold didn’t interest me and created too much noise in the data.

On a side note, seeing the Whole Foods category consistently recorded higher levels of these Chemicals was an eye opener. Charging a premium for marginal better quality only to see these harmful chemicals showing higher levels is quite ironic.

import plotly.express as px
import pandas as pd
from dash import Dash, dcc, html

# Load data
data_path = r"Data_path"
data = pd.read_csv(data_path)

# Clean the BPA column: Remove unwanted characters and replace invalid values
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].replace(r'^\s*$', None, regex=True)
data['BPA_percent_tdi_14_kg_epa'] = data['BPA_percent_tdi_14_kg_epa'].astype(str).str.replace('}', '', regex=False)
data['BPA_percent_tdi_14_kg_epa'] = pd.to_numeric(data['BPA_percent_tdi_14_kg_epa'], errors='coerce')

# Drop rows with NaN values
data.dropna(subset=['BPA_percent_tdi_14_kg_epa'], inplace=True)

# Normalize product names
data['product'] = data['product'].str.strip().str.lower()

# Filter out extremely small values (e.g., below 0.01)
data = data[data['BPA_percent_tdi_14_kg_epa'] >= 0.01]

# Take the highest value per product
grouped_data = data.groupby('product', as_index=False).max()

# Convert the highest values to percentages
grouped_data['BPA_percent_tdi_14_kg_epa'] *= 100  # Convert to percentages

# Round percentages to 1 decimal place
grouped_data['BPA_percent_tdi_14_kg_epa'] = grouped_data['BPA_percent_tdi_14_kg_epa'].round(1)

# Sort by BPA percentage
grouped_data = grouped_data.sort_values(by='BPA_percent_tdi_14_kg_epa', ascending=False)

# Create the bar chart
fig = px.bar(
    grouped_data,
    y='BPA_percent_tdi_14_kg_epa',
    x='product',
    title="BPA as Percentage of TDI by Product (Max Value)"
)

# Explicitly set the text template to show values correctly formatted
fig.update_traces(
    texttemplate='%{y:.1f}%',  # Format the y-values as percentages with 1 decimal
    textposition="outside",
    textfont_size=12,
    cliponaxis=False
)

# Adjust layout for better spacing
fig.update_layout(
    yaxis=dict(range=[0, 100], showgrid=True),  # Adjust range to focus on visible bars
    xaxis=dict(tickangle=-45),  # Rotate x-axis labels for readability
    width=1200  # Increase width to improve bar spacing
)

# Dash app setup
app = Dash(__name__)
 

app.layout = html.Div([
    dcc.Graph(figure=fig)
])

if __name__ == '__main__':
    app.run_server(debug=True)

ThomasD21M · January 13, 2025, 1:15pm

Hi @marieanne , Your app was very good. I like the product list: dynamic call back and how you show all chemicals by product selection. What did you mean regarding the blinded name?

Reviewing their posted dataset with formatting you’ll notice they had blanks in some products spaces but this was where they had ran multiple tests. Attempting to inspect the data in .tsv didn’t offer any help so I converted to.csv and it had a better format for me to inspect the data.

ThomasD21M · January 13, 2025, 3:08pm

@marieanne , I see the blinded_name column now. Early in my tsv to csv file conversions I deleted/didn’t use this so it threw me off, sorry for the confusion on my prior post haha.

adamschroeder · January 13, 2025, 3:10pm

Great point, @ThomasD21M , and an informative bar chart. Thanks for sharing.

marieanne · January 13, 2025, 3:34pm

Hi @ThomasD21M, I think my app is boring. The blinded_name ( = group, for example cheeseburger) groups all cheeseburger products. I first did not include the products, later when I added the disclaimer card, I added the product names too as in “why not”., it’s included in the data.

I do figurefriday in the weekend on “non-working days”. And my app is more to get familiar with the data. What makes this dataset more interesting is also what makes this dataset more scary for me.
I make a thinking error, something gets out in the open and before you know it… along those lines.

And I forgot, you’re viz is very informative.

Mike_Purtell · January 13, 2025, 10:42pm

My expertise with week 2’s dataset is very low to put it mildly. My submission only offers a few minor improvements to the sample code offered for this exercise.

Improvements include:
• diagonal_visible=False: Plots on the diagonal line have the same data for x and y axis, and show perfect correlation and linearity. They are useful as sanity check but do not add value to the visualization. diagonal_visible=False removes them.
• showupperhalf=False: The upper right plots mirror the lower left plots, with x-axis and y-axis swapped. They are useful as sanity checks but do not add value to the visualization. showupperhalf _visible=False removes them.
• Legend – after removing the diagonal and the upperhalf plots, the legend was far away from the scatterplots. I used update legend to move it closer to the plots.
• Used the ‘shipped_in’ column to set the marker colors. I don’t see any useful pattern here, but leave it in as an example.
• Used the datasource attribution as the suptitle.

Here is a screenshot:

Here is the code.

import polars as pl
import plotly.express as px

#-------------------------------------------------------------------------------
#    Read and clean the data
#-------------------------------------------------------------------------------
scatter_matrix_cols = [
    'DEHP_ng_serving',  'DBP_ng_serving','DEP_ng_serving', 'BBP_ng_serving']
df_scatter_matrix = (
    pl.scan_csv('samples.tsv', separator='\t')
    .select(pl.col(['shipped_in'] + scatter_matrix_cols))
    .with_columns(pl.col(scatter_matrix_cols).str.replace(r'<',''))
    .with_columns(pl.col(scatter_matrix_cols).str.replace(r'>',''))
    .with_columns(pl.col(scatter_matrix_cols).cast(pl.Float64))
    .filter(pl.col('DBP_ng_serving') < 8000)
    .filter(pl.col('DEHP_ng_serving') < 5000)  # adam filtered on 20K
    .collect()
)

#-------------------------------------------------------------------------------
#    Generate the scatter_matrix
#-------------------------------------------------------------------------------
fig = px.scatter_matrix(
    df_scatter_matrix,
    dimensions=scatter_matrix_cols,
    template='simple_white',
    height=1000, width=1200,
    title=(
        'Bay Area Foods with Plastic - px.scatter_matrix' + 
        "<br><sup>Data Source: PlasticList. 'Data on Plastic Chemicals in Bay Area Foods'." +  
        '  plasticlist.org. Accessed Jan 10, 2025.</sup>'
    ),
    color='shipped_in'
)
fig.update_layout(title=dict(font=dict(size=22)))

# remove plots from the grid diagonal, and from the upper half
fig.update_traces(
    diagonal_visible=False,
    showupperhalf=False,
    marker=dict(size=5),
)

# move the legend closer to the data
fig.update_layout(
        legend=dict(
        yanchor='middle',
        y=0.9,
        xanchor="left",
        x=0.4,
        font=dict(size=16)
    )
)
fig.write_html('scatter_matrix.html')
fig.show()

adamschroeder · January 14, 2025, 2:02pm

Great improvements to the charts, @Mike_Purtell , especially the diagonal_visible=False.

Is it only me or do the red markers (food samples shipped in glass vials) show a stronger correlation than the other shipping methods?

Ester · January 15, 2025, 3:57pm

I tried to show the differences in colors and also delved into the topic of bootstrap.

feanor_92 · January 16, 2025, 1:18pm

Excited to share my project Data on Plastic Chemicals in Bay Area Foods for FigureFriday!
Key features of the dashboard:
Scatter Matrix: Visualizes correlations between chemical concentrations like DEHP, DBP, and others.
Tags Analysis: Highlights the top tags associated with food items.
Locations Map: Geolocates food sampling sites on an interactive map.
Feature Analysis: Enables distribution analysis for individual chemicals.

Check out the code here:
GitHub
Kaggle

adamschroeder · January 16, 2025, 2:06pm

Nice job working with Bootstrap, @Ester .
What is the reason you chose to display DEHP ng serving in both the color bar and the scatter_matrix axes?

adamschroeder · January 16, 2025, 2:10pm

Good job, @feanor_92 , using various graph types to explore the data. I like the histogram showing the top 10 tag Frequency. Thanks for participating in Figure Friday. Hope to see you next week as well.

Ester · January 16, 2025, 2:51pm

@adamschroeder At first I wanted to put the colorscale on the product, but it didn’t work. But with DHP serving it worked right away. I think I messed it up.

ThomasD21M · January 16, 2025, 4:17pm

The map is a nice addition.

Shail-Shukla · January 20, 2025, 8:05am

Here’s my submission:

Shail-Shukla · January 20, 2025, 8:08am

Do check out the entire video to see the map in action:)

adamschroeder · January 21, 2025, 2:57pm

Thank you for posting here as well as LinkedIn, @Shail-Shukla .

Are you able to share the app code with us?

Shail-Shukla · January 22, 2025, 4:17pm

There u go:

Topic		Replies	Views
Figure Friday 2025 - week 13 Dash Python announcements , figure-friday	29	259	April 4, 2025
Figure Friday 2024 - week 50 Dash Python announcements , figure-friday	38	386	December 24, 2024
Figure Friday 2025 - week 18 Dash Python announcements , figure-friday	20	213	May 9, 2025
Figure Friday 2024 - week 36 Dash Python figure-friday	20	416	October 29, 2024
Plotly Dash Example Apps Challenge Dash Python announcements	32	8279	May 9, 2023

Figure Friday 2025 - week 2

Things to consider:

Participation Instructions:

Data Source:

Related topics