Figure Friday 2024 - week 43

adamschroeder · October 25, 2024, 2:33pm

The Open Repair Alliance report estimates that there are 4,000 community repair groups, operating in 31 countries. These groups of people come together to repair their devices while bringing together their communities.

Figure Friday week 43 repair data comes from activities at the Repair Café International. If you’d like to explore data sets from other Open Repair initiatives, you can find them on the downloads page.

Each row in the data set represents a citizen who tried to repair their device, noted the problem, the device category, the outcome, and some other relevant information.

Things to consider:

can you improve the sample figure built?
would a different figure tell the data story better?
can you create a Dash app instead?

Sample figure (zoomed in slightly):

Code for sample figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-43/OpenRepair_Data_RepairCafeInt_202407.csv")

fig = px.histogram(df, color='repair_status', x='product_age', barmode='overlay')
fig.show()

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel .

Thank you to the OpenRepair project for the data.

rstrub · October 25, 2024, 2:59pm

As much as we use plotly, which is indeed a lot, at the same time I can also say I really don’t spend much time with Plotly…if you see what I mean…

I wonder how many thousands of users you have like my organization.

https://fpi.gsfc.nasa.gov/#/

Tiga · October 25, 2024, 3:00pm

Thank for sharing this challenge with me. Definitely I will try this one by Wednesday.

See you on this Friday session.

Mike_Purtell · October 29, 2024, 1:33pm

Hi @rstrub, I think I see what you mean. I spend considerable time preparing & cleaning data for plotly visualizations, at least as much time as I spend on the plotly visualizations themselves. If that is your point, it is a nice compliment to plotly, and one that I agree with.

Mike_Purtell · October 29, 2024, 1:42pm

These scatter plots for week 43 show the number of repair cases by product age. Similar to the histograms shown in the week 43 announcement, the scatter plots provide a clearer visualization when the distributions have similar median values. All of these distributions peak between year 3 and year 5.

I dropped data where repair_status is unknown (only 1 row out of 75k+)

The x-axis ranges are set to 40 years max. All of the data are retained and can be viewed by hitting the plotly autoscale button.

The first scatter plot has a linear y-axis, the second plot has a logarithmic y-axis.

This week I learned how to set either axis scale to logarithmic. I tried replacing count values with their base-10 logarithms, and plotting them. This produces identical traces, however showing log values as axis labels and hover info (3.0 for 1000, etc) is much less intuitive. I learn something new on every figure friday project.

There is a local peak at year 10. Wondering what this means, maybe warranty or service contract expiration?

The log scale shows the 3 traces with consistent separation. This mean that the ratio between traces is maintained even as the volume of repair counts decreases with product age.

I used the pycountry library to map the 3-letter abbreviations of each country to the full country names. That said, these visualizations do not break out the data by country, but I leave this in the code for future use by me or anyone else.

Here are the screen shots and the code:

import polars as pl
import plotly.express as px
import pycountry

#------------------------------------------------------------------------------#
#  MAP COUNTRY ABBREVIATIONS TO FULL NAMES, USING PYCOUNTRY LIBRARY            #
#------------------------------------------------------------------------------#
df_countries = (
    pl.DataFrame(
        dict(
            zip(
                [c.name for c in pycountry.countries],
                [c.alpha_3 for c in pycountry.countries]
            )
        )
    )
    .transpose(include_header=True)
    .rename({'column': 'COUNTRY', 'column_0': 'CTRY_ABBR'})
)

#------------------------------------------------------------------------------#
#  READ DATA SET, TWEAK AND CLEAN FOR THIS EXERECISE                           #
#------------------------------------------------------------------------------#
df = (
    pl.read_csv('OpenRepair_Data_RepairCafeInt_202407.csv')
    .rename({'country': 'CTRY_ABBR'})
    .join(
        df_countries,
        on='CTRY_ABBR',
        how='left'
    )
    .with_columns(pl.col('product_age').cast(pl.UInt16))
    .with_columns(
        PRODUCT_AGE_COUNT = 
            pl.col('repair_status')
            .count()
            .over(['repair_status','product_age'])
            )
    .drop('problem')  # 66_071 unique problems out or 75252 entries, not useful
    .drop('group_identifier')  # too inconsistent, not useful    
    .drop('product_category_id')  # redundant, used named product category   
    .drop('partner_product_category')      # inconsistent data,      
    .drop('id')      # unique record id for this analysis not needed
    .drop('data_provider')  # all values are Repair Café International
    # only 1 entry for unknown, drop it
    .filter(~pl.col('repair_status').is_in(['Unknown']))              
)
# shift country name and abbr to left side of dataframe, drop first col
left_cols = ['COUNTRY', 'CTRY_ABBR']
reordered_cols = left_cols + [c for c in df.columns[1:] if c not in left_cols]
df = df[reordered_cols]

#------------------------------------------------------------------------------#
#  PREPARE DATAFRAME FOR SCATTER PLOTS                                         #
#------------------------------------------------------------------------------#
df_scatter = (
    df
    .select(pl.col('repair_status','product_age', 'PRODUCT_AGE_COUNT'))
    .unique(['repair_status', 'product_age'])
    .pivot(
         on='repair_status',
         values='PRODUCT_AGE_COUNT',
    )
    .sort('product_age', descending=False)
)

#------------------------------------------------------------------------------#
#  SCATTER PLOT REPAIR COUNT BY PRODUCT AGE, LINEAR SCALE                      #
#------------------------------------------------------------------------------#
plot_cols = ['Fixed',  'End of life', 'Repairable'] 
x_max = 40
fig = px.scatter(
    data_frame= df_scatter,
    x = 'product_age',
    y = plot_cols,
    template='simple_white',
    width=800,
    height=500,
)
fig.update_layout(
        title='Linear Scale (Y) of Repair Counts by product age'.upper(),
        xaxis_title='product age [years]'.upper(),
        yaxis_title='linear scale - Repair Count'.upper(),  
        yaxis_range = [0.0, 1400.0],
        xaxis_range=[0, x_max],
        legend_title=None,
        hovermode='x unified', 

)
fig.update_traces(
    mode='lines+markers',
    hovertemplate=' '.join(['%{y}'])
)
fig.show()

#------------------------------------------------------------------------------#
#  SCATTER PLOT REPAIR COUNT BY PRODUCT AGE, LOG SCALE                         #
#------------------------------------------------------------------------------#
fig = px.scatter(
    df_scatter,
    'product_age',
    plot_cols,
    template='simple_white',
    width=800,
    height=500,
    log_y=True,
)
fig.update_layout(
        title='Log Scale (Y) of Repair Counts by product age'.upper(),
        xaxis_title='product age [years]'.upper(),
        yaxis_title='log scale Repair Count'.upper(),  
        yaxis_range = [0.0, 3.5],
        xaxis_range=[0, x_max],
        legend_title=None,
        hovermode='x unified', 
)
fig.update_traces(
    mode='lines+markers',
    hovertemplate=' '.join(['%{y}'])
)
fig.show()

natatsypora · October 29, 2024, 6:53pm

Hi, thank for this challenge .
Creating graphs using the Plotly library is becoming more and more enjoyable for me as I discover new things every day.
For week 43 my choose barpolar .

View code and chart on PyCafe

adamschroeder · October 29, 2024, 7:57pm

beautiful graph, @natatsypora Thanks for sharing. I don’t remember the last time I saw a barpolar this way. Nicely done.

Mike_Purtell · October 29, 2024, 8:25pm

After seeing this beautiful example, I am eager to try barpolar with my work data. This is so much nicer than a conventional bar chart. Great job on this @natatsypora

ThomasD21M · October 31, 2024, 5:09pm

I had a simple approach and really wanted to get a Dash app running. Simple bars, line and pie charts slicing the data based of LLM recommendations on what it thought would be different interesting illustrations

I struggled with the vacuum errors as some of the problems were descriptive and I had to import and attempt a couple libraries to solve the keyword search in the problem, didnt help that some were even in different languages lol.
example:

Common Faults in Specific Products (e.g., Vacuum Cleaners)

import dash
from dash import dcc, html
import plotly.express as px
import plotly.graph_objects as go
from dash.dependencies import Input, Output
import pandas as pd
from collections import Counter
import re

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-43/OpenRepair_Data_RepairCafeInt_202407.csv", low_memory=False)

# Define a basic set of common stopwords
stop_words = set([
    "i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves",
    "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their",
    "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was",
    "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and",
    "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between",
    "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off",
    "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any",
    "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than",
    "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"
])

# Initialize the Dash app
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("Repair Event Data Analysis"),
    
    html.Div([
        html.H2("Top 5 Product Categories Seen at Events"),
        dcc.Graph(id='top-5-product-categories')
    ]),
    
    html.Div([
        html.H2("Barriers to Repair"),
        dcc.Graph(id='barriers-to-repair')
    ]),
    
    html.Div([
        html.H2("Average Age of Products at Repair Events"),
        dcc.Graph(id='average-age-products')
    ]),
    
    html.Div([
        html.H2("Repair Attempts Over Time"),
        dcc.Graph(id='repair-attempts-over-time')
    ]),
    
    html.Div([
        html.H2("Repair Success Rate Distribution"),
        dcc.Graph(id='repair-success-rate')
    ]),

    html.Div([
        html.H2("Common Faults in Specific Products (e.g., Vacuum Cleaners)"),
        dcc.Graph(id='common-faults-products')
    ])
])

@app.callback(
    Output('top-5-product-categories', 'figure'),
    Input('top-5-product-categories', 'id')
)
def update_top5_product_categories_chart(_):
    top_categories = df['product_category'].value_counts().nlargest(5)
    fig = px.pie(values=top_categories.values, names=top_categories.index,
                 title="Top 5 Product Categories Seen at Events")
    return fig

@app.callback(
    Output('barriers-to-repair', 'figure'),
    Input('barriers-to-repair', 'id')
)
def update_barriers_to_repair_chart(_):
    try:
        # Convert to DataFrame and ensure proper columns
        barriers = df['repair_barrier_if_end_of_life'].fillna("No Barrier").value_counts().reset_index()
        barriers.columns = ["Barrier", "Count"]
        
        # Create bar chart
        fig = px.bar(barriers, x="Barrier", y="Count",
                     title="Barriers to Repair", labels={"Barrier": "Barrier", "Count": "Count"})
    except Exception as e:
        # Create an empty figure with an error message if something goes wrong
        fig = go.Figure()
        fig.add_annotation(
            text=f"Error generating chart: {str(e)}",
            xref="paper", yref="paper",
            x=0.5, y=0.5, showarrow=False,
            font=dict(size=20)
        )
    
    return fig



@app.callback(
    Output('average-age-products', 'figure'),
    Input('average-age-products', 'id')
)
def update_average_age_products_chart(_):
    # Ensure 'product_age' is numeric
    df['product_age'] = pd.to_numeric(df['product_age'], errors='coerce')
    
    # Group by product category, calculate mean, reset index, and get top 10
    avg_age = df.groupby('product_category', as_index=False)['product_age'].mean().dropna().nlargest(10, 'product_age')
    
    # Check the structure of avg_age for debugging
    print(avg_age.head())  # This will output to the console for inspection

    # Explicitly cast 'product_category' to string and 'product_age' to float for Plotly
    avg_age['product_category'] = avg_age['product_category'].astype(str)
    avg_age['product_age'] = avg_age['product_age'].astype(float)
    
    # Create the bar chart
    fig = px.bar(avg_age, x='product_category', y='product_age',
                 title="Average Age of Products at Events", 
                 labels={"product_category": "Product Category", "product_age": "Average Age"})
    
    return fig



@app.callback(
    Output('repair-attempts-over-time', 'figure'),
    Input('repair-attempts-over-time', 'id')
)
def update_repair_attempts_over_time_chart(_):
    df['event_year'] = pd.to_datetime(df['event_date'], errors='coerce').dt.year
    repair_attempts = df['event_year'].value_counts().sort_index()
    fig = px.line(repair_attempts, x=repair_attempts.index, y=repair_attempts.values,
                  title="Repair Attempts Over Time", labels={"x": "Year", "y": "Number of Repair Attempts"})
    return fig

@app.callback(
    Output('repair-success-rate', 'figure'),
    Input('repair-success-rate', 'id')
)
def update_repair_success_rate_chart(_):
    success_rate = df['repair_status'].value_counts()
    fig = px.pie(success_rate, values=success_rate.values, names=success_rate.index,
                 hole=0.3, title="Repair Success Rate Distribution")
    return fig

@app.callback(
    Output('common-faults-products', 'figure'),
    Input('common-faults-products', 'id')
)
def update_common_faults_products_chart(_):
    vacuum_problems = df[df['product_category'] == 'Vacuum']['problem'].dropna()
    
    if vacuum_problems.empty:
        fig = go.Figure()
        fig.add_annotation(
            text="No data available for common faults in Vacuum Cleaners",
            xref="paper", yref="paper",
            x=0.5, y=0.5, showarrow=False,
            font=dict(size=20)
        )
    else:
        all_words = []
        for description in vacuum_problems:
            words = re.findall(r'\b\w+\b', description.lower())
            filtered_words = [word for word in words if word not in stop_words and len(word) > 1]
            all_words.extend(filtered_words)
        
        word_counts = Counter(all_words).most_common(10)
        keywords, counts = zip(*word_counts)
        fig = px.bar(x=keywords, y=counts, title="Common Keywords in Vacuum Cleaner Problems",
                     labels={"x": "Keyword", "y": "Count"})
    
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

ThomasD21M · October 31, 2024, 5:11pm

output to my vacuum problem query, clearly by stop word portion of my script wasn’t working as intended or not thorough enough.

ThomasD21M · October 31, 2024, 8:12pm

Been experimenting with this further, wanted to see countries with most successful repair history using choropleth plot.

criteria was ratio of successful repairs = fixed against total entries per country.

Sweden, Ireland and Switzerland are doing a great job!

import pandas as pd
import plotly.express as px
import pycountry  # Library to map country codes to full names

# Load data
df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-43/OpenRepair_Data_RepairCafeInt_202407.csv", low_memory=False)

# Filter for valid entries in the 'country' and 'repair_status' columns
df = df.dropna(subset=['country', 'repair_status'])

# Calculate total repairs and fixed repairs by country
total_repairs = df.groupby('country').size()  # Total entries per country
fixed_repairs = df[df['repair_status'] == 'Fixed'].groupby('country').size()  # Only 'Fixed' entries per country

# Create a DataFrame with both total and fixed counts
repair_data = pd.DataFrame({
    'Total Repairs': total_repairs,
    'Fixed Repairs': fixed_repairs
}).fillna(0)  # Fill NaN with 0 for countries without 'Fixed' repairs

# Calculate the ratio of fixed repairs to total repairs
repair_data['Repair Success Ratio'] = repair_data['Fixed Repairs'] / repair_data['Total Repairs']

# Reset index to make 'country' a column for plotting
repair_data.reset_index(inplace=True)

# Map 3-letter country codes to full country names
def get_country_name(iso_code):
    try:
        return pycountry.countries.get(alpha_3=iso_code).name
    except AttributeError:
        return iso_code  # Return the code itself if the name is not found

# Apply the mapping function to add a 'country_name' column
repair_data['country_name'] = repair_data['country'].apply(get_country_name)

# Plot choropleth map using ISO-3 codes, with full country names in hover data
fig = px.choropleth(
    repair_data,
    locations="country",
    locationmode="ISO-3",  # Use ISO-3 country codes
    color="Repair Success Ratio",
    hover_name="country_name",  # Show full country name on hover
    hover_data={"country": True, "Total Repairs": True, "Fixed Repairs": True, "Repair Success Ratio": ':.2%'},
    color_continuous_scale="Blues",
    title="Repair Success Ratio by Country"
)

# Show plot
fig.show()

Tiga · October 31, 2024, 11:39pm

On my side, I displayed on top using Card Visual : Total repair attempts, % of Fixed During Repair Event, % of Repairable after the Event, % of End of Life products.

After that, I looked at the repair attemps logging over time using bar plot and finally using data_table from dash core component to display product category along with the number of End of Life Products, the number of those which are Fixed, repairable, the number of repair attemps , etc

Here is my code :

from dash import Dash, dash_table, html, dcc
import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template
import pandas as pd 

from dash import dash_table
import plotly.express as px


df = pd.read_csv("OpenRepair_Data_RepairCafeInt_202407.csv", 
                 parse_dates=["event_date"], 
                 low_memory=False, 

                )


category_repair_status =  (pd.concat([df["product_category"], pd.get_dummies(df["repair_status"], dtype="int")], axis= 1)
                           .groupby("product_category").sum()
                          )

category_repair_status = category_repair_status.assign(
    repair_attempts = category_repair_status.sum(axis=1),
    pct_of_total = (category_repair_status.sum(axis=1)/sum(category_repair_status.sum(axis=1)) * 100).round(2)
)

category_repair_status = category_repair_status.sort_values(by="pct_of_total", ascending=False)

category_repair_status = category_repair_status.reset_index()



app = Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])

load_figure_template("BOOSTRAP")

app.layout = dbc.Container([
    dbc.Row([
        dbc.Col(dbc.Card([
            dbc.CardBody([
                html.H4("Total repair attempts", className="card-title"),
                html.H3(f"{df["id"].count()}")
            ])
        ])),
         dbc.Col(dbc.Card([
            dbc.CardBody([
                html.H4("Fixed During Repair Event", className="card-title"),
                html.H3(f"{(category_repair_status["Fixed"].sum()/df["id"].count() * 100).round(2)} %")
            ])
        ])),
        
        
         dbc.Col(dbc.Card([
            dbc.CardBody([
                html.H4("Repairable after the Event", className="card-title"),
                html.H3(f"{(category_repair_status["Repairable"].sum()/df["id"].count() * 100).round(2)} %")
            ])
        ])),
        
         dbc.Col(dbc.Card([
            dbc.CardBody([
                html.H4("End of Life products", className="card-title"),
                html.H3(f"{(category_repair_status["End of life"].sum()/df["id"].count() * 100).round(2)} %")
            ])
        ])),
        
        
        
    ]),
    
    
    dbc.Row([
        dbc.Col([
            dcc.Graph(figure=px.bar(df.groupby(df["event_date"].dt.year)["id"].count().rename("count").reset_index(), 
                x="event_date", 
                y="count",
                ).update_layout(title={
                "text": "Repair attempts logged over time",
                "x" : .5,
                "y" : 0.85,
                "font" : {"size" : 25}
            })
         )
        ])
    ]),
    
    dbc.Container(
        style= {"margin-left": "30px", "margin-right":"30px"},
        children=[
        
        dash_table.DataTable(
            data=category_repair_status.to_dict('records'),
            columns=[{"name": i, "id": i} for i in category_repair_status.columns],
            filter_action = "native",
            sort_action = "native",
            export_format = "csv",
            style_header={ 'border': '1px solid black', 'textAlign': 'left',  
                          'backgroundColor': 'blue',
                          'color': 'white',
                          "fontSize": "25px"
                         },
            style_cell={ 'border': '1px solid grey', "backgroundColor" : "white", 
                        "fontSize" : "20px",
                        'textAlign': 'left'
                       },
            style_data_conditional = [
                {
                    'if': {
                        'filter_query': '{pct_of_total} >= 3.79',
                        'column_id': 'pct_of_total'
                    },
                    'color': 'blue',
                    'font-weight' : "bold",
                    'backgroundColor': 'lightblue'

                },

                   {
                    'if': {
                        'filter_query': '{pct_of_total} <= 0.66',
                        'column_id': 'pct_of_total'
                    },
                    'color': 'red',
                    'font-weight' : "bold",
                    'backgroundColor': 'lightgrey'

                },

                {
                    'if' : {
                        'column_id'  : "product_category"
                    },
                    'font-weight': 'bold'
                }
            ]
        )
    ]) 
]
)



if __name__ == '__main__':
    app.run(debug=True, port=1020)

adamschroeder · November 1, 2024, 12:19pm

nice graphs, @ThomasD21M . it’s helpful to see the repair success rate. Why does the hover over Switzerland say 1 total repair? Does 1 mean 100% success in this case?

Hope you can join the Figure Friday session today.

adamschroeder · November 1, 2024, 12:24pm

nice job, @Tiga . Your code wasn’t showing correctly; you just need to remember to put it all between 3 back ticks, or use the Preformatted text button. I already fixed it. Hope you can join the Figure Friday session today.

Tiga · November 1, 2024, 1:13pm

Thank you for notifying me. I adjust it using Preformatted text, hope it’s showing correctly now.

ThomasD21M · November 1, 2024, 1:44pm

You’re absolutely right; including countries with only one or very few entries can skew the visual interpretation, as a single successful repair gives a misleading 100% success rate. To address this, I and should have could set a threshold for the minimum number of total repairs required to display a country’s repair success ratio.

ThomasD21M · November 1, 2024, 1:45pm

I like the export table! The feature to filter through data on Dash App and export that selection would be nice too.

natatsypora · November 3, 2024, 5:21am

Your simple approach is impressive!
I also tried to create a chart of the most frequently used words but describing the problem in several different languages makes it difficult to clean up data …
After using a collection of stopwords for multiple languages from Stopwords ISO I was able to create a chart, but I’m not sure it is informative enough

code on PyCafe

natatsypora · November 3, 2024, 5:38am

Thank you for your review. I am very glad that you liked it.
For more than a month I have been watching your posts with pleasure, admiring the cleanliness and informativity of your code and visualiztions!

adamschroeder · November 4, 2024, 4:42pm

lovely figures, @natatsypora . Do you mind sharing the code for them?

Topic		Replies	Views
Figure Friday 2024 - week 35 Dash Python figure-friday	19	255	September 10, 2024
Figure Friday 2024 - week 31 Dash Python figure-friday	19	306	August 10, 2024
Figure Friday 2024 - week 28 Submissions Dash Python show-and-tell	0	115	July 22, 2024
Dash Club 15: Figure Friday, Plotly Hangouts, Product Updates, Blogs, Component, and App of the Month Dash Python announcements , dash-club	0	193	July 18, 2024
Figure Friday 2024 - week 37 Dash Python figure-friday	17	210	September 20, 2024

Figure Friday 2024 - week 43

Things to consider:

Participation Instructions:

Common Faults in Specific Products (e.g., Vacuum Cleaners)

Related topics