Figure Friday 2024 - week 45

adamschroeder · November 8, 2024, 2:02pm

Did you know that Canada ranks among the top 5 global producers of diamonds, gemstones, gold, indium (among a few others) and that the value of Canada’s mineral production reached $74.6 billion in 2022? (Minerals and the economy)

In this week’s data set we’ll explore around 950 past and present productive mines in Canada between 1950-2022.

If you’d like to read more about the data, see figshare.

Things to consider:

can you improve the sample Gantt figure built?
would a different figure tell the data story better?
can you create a Dash app instead?

Sample Gantt figure:

Code for sample figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-45/mines-of-Canada-1950-2022.csv")
df = df[(df['province'] == 'Nunavut') & (df['close1'] != 'open')]
df['close1'] = pd.to_datetime(df['close1'].astype(str), format='%Y').dt.strftime('%Y-%m-%d')
df['open1'] = pd.to_datetime(df['open1'].astype(str), format='%Y').dt.strftime('%Y-%m-%d')

fig = px.timeline(data_frame=df,
                  x_start="open1",
                  x_end="close1",
                  y="namemine",
                  hover_name='commodityall',
                  title='Closed mines in the Nunavut province - Canada')
fig.show()

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to Clara Dallaire-Fortier for the data.

Alfredo49 · November 11, 2024, 4:17am

I’m back!

My collaboration is a Dash App with folllowing features:

Scatter Map Chart: map displaying the coordinates location for mines across Canada
Gantt Chart: Visual timeline for mines open and close intervals
Commodity Filter: Filter mines based on commodity produced
Province Filter: Filter mines by province
Status Filter: Filter mines by status (open or closed)
Phase Filter: Filter projects by the number of times (phases) it had reopened

CODE

adamschroeder · November 12, 2024, 8:43pm

Very good app, @Alfredo49 . Thanks for sharing.
Is there a reason you chose to limit the bottom graph to only gold mines, instead of coal and other resources?

Alfredo49 · November 12, 2024, 11:21pm

Thanks Adam,

The bottom chart is filtered by the commodity selected at the top dropdown but it defaults to Gold when the selection option is All. I took this approach to limit the number of datapoints and not oversature the chart.

Mike_Purtell · November 14, 2024, 1:50pm

This week’s challenge used a dataset of 950 Canadien mines, to create a Gantt chart.

The selected mines …

… have produced coal as all or some of its output

… are no longer operating

… were in service for 25 years or longer

The mines are grouped by the province in which they reside. Considerable
amount of polars dataframe manipulation was used for sorting and to insert
a group row above the mines from each province.

I use charts like this for project management and status reports. Typically
unfinished tasks are color coded to show level of completion.

Saving the chart as html file makes it easy to share with other team members. This
could be a useful component in a dashboard with schedules, yields, shipping levels, etc.

Appreciate any comments or suggestions. If you run this code and get stuck, please reach out to me.

Here is a sreenshot and the source code.

from datetime import datetime
import polars as pl
import plotly.express as px

# constants
MIN_YEARS = 25  # gantt chart includes mines with MIN_YEARS or more of service
SOURCE_LOCAL = False # if True, data from csv, if False data from get git-repo
today = datetime.now().strftime('%Y_%m_%d')
local_csv = 'week_45_data.csv'

#------------------------------------------------------------------------------#
#     initialize dataframe df_source from local file or git repo
#------------------------------------------------------------------------------#
if SOURCE_LOCAL:
    # this path reads data previously saved to local drive
    df_source = (
        pl.scan_csv(local_csv, ignore_errors=True)
        .collect()
    )
else:
    # this path downloads the data from an external git repository
    web_csv = (  # file name split over 2 lines, PEP-8
        'https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/' +
        'main/2024/week-45/mines-of-Canada-1950-2022.csv'
    )
    df_source = (
        pl.read_csv(web_csv,ignore_errors=True)
        .filter(pl.col('close1').str.to_uppercase() != 'OPEN')
        .filter(pl.col('commodityall').str.contains('Coal'))
        .rename(
            {   # clean up selected column names
                'company1' : 'COMPANY',
                'namemine' : 'MINE',
                'town'     : 'TOWN',
                'province' : 'PROVINCE',
                'open1'    : 'YEAR_OPENED',
                'close1'   : 'YEAR_CLOSED'
            }
        )
        .with_columns(pl.col('YEAR_OPENED', 'YEAR_CLOSED').cast(pl.Int16))
    )
    # data has been read from git-repo, so save a local copy
    df_source.write_csv(local_csv)

#------------------------------------------------------------------------------#
#     add DATE_OPENED and DATE_CLOSED as Date columns, needed for timeline 
#------------------------------------------------------------------------------#
df = (
    df_source
    .with_columns(
        DATE_OPENED = pl.col('YEAR_OPENED')
                        .cast(pl.String)
                        .str.strptime(pl.Date, format='%Y')
    )
    .with_columns(
        DATE_CLOSED = pl.col('YEAR_CLOSED')
                        .cast(pl.String)
                        .str.strptime(pl.Date, format='%Y')
    )
    .select(pl.col('COMPANY', 'MINE', 'TOWN', 'PROVINCE',
               'DATE_OPENED', 'DATE_CLOSED', 'YEAR_OPENED', 'YEAR_CLOSED')
            )

    .sort(['PROVINCE', 'DATE_OPENED'])
)

#------------------------------------------------------------------------------#
#     Use province names as section titles, indexed with integer-like values,
#     1, 2, 3, etc. S. Section members are coal mines, with incremental index 
#     values of 1.01, 1.02, etc. Each province group has a first row that will
#     be formatted as a section head  
#------------------------------------------------------------------------------#
province_list = sorted(list(set(df['PROVINCE'])))
# Add header row above the data of each province
df_list = []
for i, province in enumerate(province_list):
    opened = ( # 1st row, opened is the earliest year opened
        df     # of all coal mines in this province
        .filter(pl.col('PROVINCE') == province)
        .select(pl.col('DATE_OPENED'))
        .min().to_series()[0]
    )
    closed = (  # closed is the last year closed of all coal mines
        df      # in this province
        .filter(pl.col('PROVINCE') == province)
        .select(pl.col('DATE_CLOSED'))
        .max().to_series()[0]
    )
    df_first_row = (   # here is the data frame for 1st row of province group
        pl.DataFrame(
            {
                'COMPANY'      : '<b>' + province.upper() + '</b>',
                'MINE'         : '',
                'TOWN'         : '',
                'PROVINCE'     : province,
                'DATE_OPENED'  : opened,
                'DATE_CLOSED'  : closed,
                'YEAR_OPENED'  : opened.year,
                'YEAR_CLOSED'  : closed.year,
            }
        )
        .with_columns(pl.col('YEAR_OPENED', 'YEAR_CLOSED').cast(pl.Int16))
    )
    df_province = ( # finsh province group with concat of 1st row and all others
        pl.concat(
            [
                df_first_row,
                df
                .filter(pl.col('PROVINCE') == province)
                .sort('DATE_OPENED', descending=False)
            ]
        )
        # temporary columns GROUP, GROUP_COUNT used for calculating item #
        .with_columns(GROUP = pl.lit(i+1))
        .with_columns(GROUP_COUNT = pl.col('GROUP').cum_count().over('GROUP') - 1)
        .with_columns(ITEM = (   # ITEM serves a row index
            pl.col('GROUP') + pl.col('GROUP_COUNT')/100.0).cast(pl.Float32()))
        .with_columns(
            ITEM_COMPANY = (
                pl.lit('  ')  +  
                pl.col('ITEM').cast(pl.Utf8).str.pad_end(4, '0')
                + pl.lit(': ') 
                + pl.col('COMPANY')
            ),
        )
        .with_columns(
            YEAR_OPENED = (pl.col('DATE_OPENED').dt.year().cast(pl.Int32)),
            YEAR_CLOSED = (pl.col('DATE_CLOSED').dt.year().cast(pl.Int32))                                      
        )
        .with_columns(
            DURATION_YEARS = (pl.col('YEAR_CLOSED') - pl.col('YEAR_OPENED'))
        )
        .with_columns(
            MINE = pl.when(pl.col('MINE').is_null())
                     .then(pl.lit('None'))
                     .otherwise('MINE')
        )
        .with_columns(
            TOWN = pl.when(pl.col('TOWN').is_null())
                     .then(pl.lit('No Name Town'))
                     .otherwise('TOWN')
        )
        .filter(pl.col('DURATION_YEARS') >= MIN_YEARS)
    )
    # provinces need 2 or more rows, as first row is only a header
    if len(df_province) > 1:
        df_list.append(df_province)

df = pl.concat(df_list)  # this is the final step of data frame creation

#------------------------------------------------------------------------------#
#     plolty timeline
#------------------------------------------------------------------------------#
my_title = 'Shuttered Canadien Coal Mines<br>'
my_title += f'<sup>Closed mines that operated for {MIN_YEARS}+ years'
fig = px.timeline(   # DATE_OPENED AND DATE_CLOSED are type Date
    df,
    x_start='DATE_OPENED',
    x_end='DATE_CLOSED',
    y = 'ITEM_COMPANY',   # index has been prepended to COMPANY for sorting
    title = my_title,
    height = 1400,
    width = 1000,
    color='GROUP_COUNT',
    custom_data=['COMPANY', 'TOWN', 'PROVINCE',  'MINE', 
                 'YEAR_OPENED', 'YEAR_CLOSED', 'DURATION_YEARS']
)

fig.update_yaxes(categoryorder='category descending', automargin=True)
fig.update_layout(
    title=dict(font=dict(size=24), automargin=False, yref='paper'))
fig.update_layout(
    yaxis = dict( tickfont = dict(size=16), tickmode = 'linear', dtick=0.01))
fig.update_layout(xaxis = dict( tickfont = dict(size=16)))

fig.update_layout(
    xaxis={'side': 'top'}, 
    yaxis={'side': 'right'},
    template = 'presentation',
    yaxis_title = '',
)

#------------------------------------------------------------------------------#
#     Use list comprehension to find integer-like ITEM#s to use as group heads
#------------------------------------------------------------------------------#
int_items = [
    i for i, x in enumerate(df['ITEM'].sort(descending=True).to_list()) 
    if x == round(x, 0)
]
for item_num in int_items:  # put thick horiz line on province group head
    fig.add_hline(
        y=item_num, 
        line_width=10, 
        line_color='black', 
        layer='below'
        )

#------------------------------------------------------------------------------#
#     Add vertical line on today's date. Useful when using timeline for project
#     schedule, not useful in this case where date resolution is by year. x is
#     number of milliseconds since epoch, used as a workaround
#------------------------------------------------------------------------------#
fig.add_vline(
    x=datetime.strptime(today, "%Y_%m_%d").timestamp() * 1000,
    line_width=2, 
    line_color="green", 
    line_dash="dash", 
    annotation_text=str(datetime.now().strftime('%b %d')),
    annotation_position='bottom',
    annotation_font_size=20
)
#------------------------------------------------------------------------------#
#     customize hover template, uses columns from px.timeline, custom_data
#------------------------------------------------------------------------------#
fig.update_traces(
    hovertemplate="<br>".join([
        '<b>%{customdata[0]}</b>',
        '%{customdata[1]}, %{customdata[2]}',
        'Mine Name: %{customdata[3]}',
        '%{customdata[4]} to %{customdata[5]}',
        '(%{customdata[6]:.0f} Years)',
        '<extra></extra>'
    ])
)

fig.update_layout(
    hoverlabel=dict(font=dict(family='sans-serif', size=16)),
    showlegend=False,
    title_x=0
    )

#------------------------------------------------------------------------------#
#     fix y labels, for example '1.03 - Mine XYZ' becomes 'Mine XYZ'
#------------------------------------------------------------------------------#
y_ticks = df['ITEM_COMPANY']
fig.update_yaxes(
    tickmode='array',
    tickvals=y_ticks,
    ticktext=[y[7:] for y in y_ticks]  # strips away first 7 characters
)

fig.show()
fig.write_html('Shuttered_Coal_Mines.html')

li.nguyen · November 14, 2024, 3:17pm

Hello @Alfredo49 - great to see you back! I was away for a while too, exploring Japan!

I really appreciate the simplicity of your app – it has a very sleek design! While testing it out, I observed that the first filter affects both charts, but the lower filter only impacts the Gantt chart.

To enhance the user experience of your dashboard, you might consider grouping related content more closely and making the different sections visually distinct. Since the last three filters only affect the second chart, placing them nearer to it could help. Right now, these filters are placed similarly close to both charts, which might cause some confusion as they don’t apply to both.

Here’s an example of how you could rearrange the layout to group related content together. This is just one way how you could do it, but it’s the first thing that came to mind Additionally, you could use background colors or border colors to differentiate the sections further.

I also recommend reading this article on dashboard best practices, they also mention the importance of grouping related metrics and adding whitespace: Effective dashboard design | A step-by-step guide | Geckoboard

Keep up the great work!

li.nguyen · November 14, 2024, 3:30pm

Hey @Mike_Purtell,

Your consistent participation in FigureFriday is truly admirable!

As always, your hover tooltips are beautiful —clear and concise! The chart is definitely easy to interpret. At first glance, I would suggest moving the y-axis labels to the left, primarily because we usually read from left to right.

Additionally, you could move the bars closer to the left and add a 0 y-axis line to make it visually clearer that the bars start at the same baseline. For example, I wasn’t sure if the bars for Alberta and British Columbia start at the same baseline, as there is quite a bit of distance between them. The 0 y-axis line could really help with this!

I have a couple of clarifying questions:

What is the purpose of the color scheme? It appears the same color is applied to different mines, which can be a bit confusing. If the colors don’t have a specific meaning, it might be better to leave them out.
What is the green dotted line for? Is it just for separating the axis labels? If so, you might consider removing it to declutter the chart even further.

As always, wonderful work though!

li.nguyen · November 14, 2024, 3:32pm

Hello everyone!

Apologies for my extended absence—I was traveling around Japan and decided to take a technology detox. I’ll be more active in the FigureFriday challenges again!

It’s wonderful to see all your contributions!

adamschroeder · November 14, 2024, 4:46pm

Welcome back @li.nguyen . I hope you had a great time in Japan.

natatsypora · November 14, 2024, 5:40pm

Hello everyone!

The Gantt Chart is a very interesting and self-sufficient chart, as an alternative I can only offer a Scatter Plot

Code

fig2 = go.Figure()
for row in df_old.sort_values(by='total_years').itertuples():
    customdata = [row.province, row.commodityall, row.total_years]
    fig2.add_traces(go.Scatter(
        y=[row.namemine, row.namemine], x=[row.open1, row.close1],
        mode='markers+lines',  name='', line_width=5, opacity=0.8,
        marker=dict(color='darkgrey', size=15),        
        customdata=(customdata, customdata),
        hovertemplate='<br>'.join(['%{x}', 'Mine Name: %{y}',
                                   'Province: %{customdata[0]}',
                                   'Commodity: %{customdata[1]}', 
                                   'Total years in operation: %{customdata[2]}'])
        ))
# Update color for open mines    
for t in fig2.data:
    if t.x[1].year == 2024:        
        t.marker.color = 'forestgreen'  

fig2.update_layout(
    title='Oldest Mines Timeline - Up to 120 Years in Operation', title_x=0.2,
    width=1000, template='plotly_white', 
    showlegend=False, font_size=14,
    margin=dict(l=100, t=70, r=10, b=10),
    yaxis_title=None, yaxis_griddash='dot', yaxis_gridwidth=2,
    xaxis_tickformat='%Y')  

# Add text with 'total years' in operation
fig2.add_scatter(
    y=df_old['namemine'], x=df_old['close1'],
    hoverinfo='skip', showlegend=False,
    mode='text', text=df_old['total_years'], 
    textposition='middle right', texttemplate='&#x2003;%{text}')

# Add annotation with a green circle near the title 
fig2.add_shape(type="circle",
               x0=0.75, x1=0.77, y0=1.08, y1=1.11, 
               xref="paper", yref="paper",
               label_text=f'&#x2003;Open', label_textposition='middle left',
               line=dict(color="green", width=2), fillcolor="green")

# Add annotation with a grey circle near the title 
fig2.add_shape(type="circle",
               x0=0.88, x1=0.90, y0=1.08, y1=1.11, 
               xref='paper', yref='paper',
               label_text=f'&#x2003;Close', label_textposition='middle left',
               line=dict(color="lightgrey", width=2), fillcolor="lightgrey",)
 
fig2.show(config={'displayModeBar': False})

adamschroeder · November 14, 2024, 6:00pm

@natatsypora It’s pretty cool that you were able to replicate the Gantt chart with scatter and add_traces. Your chart really looks similar to the Dumbbell plots in the Plotly Dumbbell page

natatsypora · November 14, 2024, 6:27pm

Thank you
Scatter Plot is my favorite kind of graphics.
Gantt Chart (px.timeline) requires fewer lines of code, all necessary is already implemented “under the hood”

Click to view differences

fig = px.timeline(data_frame=df_old, opacity=0.7,
                  x_start='open1',                   
                  x_end='close1',
                  y='namemine', 
                  color='is_open',
                  color_discrete_map={False:'lightgrey', True:'forestgreen'},                    
                  hover_name='company1', hover_data=['total_years', 'province', 'commodityall'],
                  title='Oldest Mines Timeline - Up to 120 Years in Operation')

fig.add_scatter(y=df_old['namemine'], x=df_old['close1'],
                hoverinfo='skip', showlegend=False,
                mode='text', text=df_old['total_years'], 
                textposition='middle left', texttemplate='%{text}')

fig.update_layout(width=1000, template='plotly_white', font_size=14,
                  legend=dict(orientation='h', title='Is Open', x=0.55, y=1.15, itemclick=False),
                  margin=dict(l=100, t=70, r=10, b=10),
                  yaxis_categoryorder='max ascending', yaxis_title=None, xaxis_tickformat='%Y') 
fig.show(

Alfredo49 · November 15, 2024, 1:04am

Hi @li.nguyen ,

Thank you for the feedback! . Your design tips are always spot on.
I will implement them on my next dashboard.

btw: it is awesome to have you back on fig fridays

ThomasD21M · November 15, 2024, 2:16pm

Inspired by @Alfredo49 , I wanted to animate the Density using Plotly.express while also showing the activity animated by year with some additional data displayed as the years progress.

Next step making the map smaller and adding a bar chart animation to the left of it that animates the companies with the most active mines or total tonnage each year, showing them “battle it out” for top position.

ThomasD21M · November 15, 2024, 2:27pm

I used Replit for this exercise (curious on it’s creativity with AI) I used my OpenAI API key plugin. It’s pretty impressive, some debugging and additional corrections needed after first pass. Had a lot of fun with this one.

Trying out this invite link to this Replit project:
Canadian Mines Animation

li.nguyen · November 15, 2024, 3:01pm

This is a beautiful example, @natatsypora!

I prefer your dumbbell version even over the Gantt chart version because it provides more space between the observations, making it look much tidier. Your chart is wonderfully decluttered with a very clear color choice!

One suggestion I have is to create a more user-friendly legend to make it easier for non-technical people to understand. True/False values can be a bit confusing, so you might consider renaming True to Open and False to Closed. This would allow you to remove the legend title entirely, as the values would be self-explanatory

li.nguyen · November 15, 2024, 3:19pm

Wow, this is incredibly impressive! Did you have any prior experience with building Streamlit apps? Out of curiosity - how much of it worked seamlessly out of the box, and how much did you have to tweak or correct?

Very clean design! I love how you provide clear instructions and action items for the user—this makes navigating the dashboard very straight-forward.

I am wondering how it might look like if you choose a dark chart theme, to match it to the overall dark theme of the dashboard. This might help to emphasize the data points more. Even though I also like how the chart stands out at the moment due to its white background

Mike_Purtell · November 15, 2024, 4:07pm

Hello @li.nguyen,

Thank you for your kind and insightful comments. Regarding the color scheme, I often use px.timeline for project scheduling & status reports with 5 well defined colors to show completion status(0% to 100% in 25% increments). In that context the colors are helpful and make sense.

For this visualization I didn’t put any thought into the colors. The sequence is the same for each province where mines are listed by order of when they opened, so the first mine to open in Allbert has the same color as the first mine to open in British Columbia. But this is a sloppy approach. There is no need to distinguish when the mines opened by color, as this is already evident by the timeline. Better way may have been to just use the same color for all mines, or have colors assigned by the province.

The green line on the right coincides with the date when the python code was run. This is not useful here, where all of the timelines are past history, but can be very helpful for projects or event tracking that includes past and future dates.

For the y-axis labels, it would make sense to move them from the right side to the left side in this context, however if the labels are long and wordy, the right side may be better.

Best regards,

Mike

natatsypora · November 15, 2024, 4:17pm

Thank you for your feedback and very valuable advice!
Space between observations is easy to change fig.update_traces(width=0.5).
You are absolutely right about True/False.
New version of the graphic looks a lot better

Click to view more

adamschroeder · November 15, 2024, 5:06pm

Zoom link for figure friday session is now live:

Topic		Replies	Views
Figure Friday 2024 - week 50 Dash Python announcements , figure-friday	38	358	December 24, 2024
Figure Friday 2024 - week 48 Dash Python announcements , figure-friday	26	277	December 7, 2024
Figure Friday 2025 - week 18 Dash Python announcements , figure-friday	9	97	May 7, 2025
Figure Friday 2024 - week 49 Dash Python announcements , figure-friday	44	310	December 17, 2024
Figure Friday 2024 - week 36 Dash Python figure-friday	20	394	October 29, 2024

Figure Friday 2024 - week 45

Things to consider:

Participation Instructions:

Data Source:

Related topics