Figure Friday 2025 - week 20

join the Figure Friday session on May 23, at noon Eastern Time, to showcase your creation and receive feedback from the community.

How often do US dams get inspected? Where are they located and who is responsible for them?

Answer these and many other questions by using Plotly and Dash on the National Inventory of Dams. The complete dataset can be found on the National Inventory of Dams website.

Things to consider:

  • what can you improve in the app or sample figure below (line chart)?
  • would you like to tell a different data story using a different graph?
  • can you create a different Dash app?

Sample figure:

Code for sample figure:
import plotly.express as px
import pandas as pd

# download CSV sheet - https://drive.google.com/file/d/15XcdEYqwLTXSMurBDr9HuXyV6uQaYBPw/view?usp=sharing
df = pd.read_csv("nation-dams.csv")

# Create year column of last inspection date
df['Last Inspection Date Year'] = pd.to_datetime(df['Last Inspection Date']).dt.year
yearly_groups = df.groupby(df['Last Inspection Date Year'])

# Get count by year
yearly_counts = yearly_groups.size().reset_index(name='amount')

fig = px.line(yearly_counts, x='Last Inspection Date Year', y='amount', markers=True)
fig.show()

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to the National Inventory of Dams and to Data Is Plural for the data.

1 Like

I created a dashboard with a scatter_map, a bar chart, a dash bootstrap card and a dash-ag table.

The focus is on storage in the contained body of water. Note that many bodies of water have multiple dams that show up on the bar chart with equal values. This is expected because every dam on the same body of water contains the same volume of water. Example is Lake George in New York/Ticonderoga with 2 dams, or the Saint Lawrence River in New York with 11 dams along this border river with Canada.

Still a bit rough around the edges, I will update a few things before Friday if time permits.

Here is a screenshot for California.

Here is the code:

import polars as pl
import plotly.express as px
from dash import Dash, dcc, html, Input, Output
import dash_bootstrap_components as dbc
import dash_ag_grid as dag

#----- GLOBAL DATA STRUCTURES --------------------------------------------------
dam_info = ['DAM','LAT','LONG','STATE','COUNTY','CITY','WATERWAY','YEAR_COMP',]

dam_stats = [
'NID_CAP_ACR_FT', 'MAX_STG_ACR_FT', 'NORM_STG_ACR_FT',
'DRAINAGE_SQ_MILES', 'SURF_AREA_SQM', 'MAX_DISCHRG_CUB_FT_SEC'
]
dam_table_cols = [
    'DAM','COUNTY','CITY','WATERWAY','YEAR_COMP',
    'NID_CAP_ACR_FT', 'MAX_STG_ACR_FT', 'NORM_STG_ACR_FT',
    'DRAINAGE_SQ_MILES', 'SURF_AREA_SQM', 'MAX_DISCHRG_CUB_FT_SEC',
    'LAT','LONG',
]

style_space = {
    'border': 'none',
    'height': '4px', 
    'background': 'linear-gradient(to right, #007bff, #ff7b00)', 
    'margin': '10px,',
    'fontsize': 32
}

grid = dag.AgGrid(
    rowData=[],
    columnDefs=[
        {"field": i, 'filter': True, 'sortable': True} for i in dam_table_cols
    ],
    dashGridOptions={"pagination": True},
    id='dam_table'
)

#----- READ & CLEAN DATASET ----------------------------------------------------
df = (
    pl.scan_csv(
        'nation.csv',
        ignore_errors=True, 
        skip_rows=1
    )
    .select(
        DAM = pl.col('Dam Name'),
        LAT = pl.col('Latitude'),
        LONG = pl.col('Longitude'),
        STATE = pl.col('State').str.to_titlecase(),
        COUNTY = pl.col('County').str.to_titlecase(),
        CITY = pl.col('City').str.to_titlecase(),
        WATERWAY = pl.col('River or Stream Name').str.to_titlecase(),
        YEAR_COMP = pl.col('Year Completed'),
        DECADE_COMP = pl.col('Year Completed Category'),

        # storage statisics for group_by aggregations
        NID_CAP_ACR_FT = pl.col('NID Storage (Acre-Ft)'),
        MAX_STG_ACR_FT = pl.col('Max Storage (Acre-Ft)'),
        NORM_STG_ACR_FT = pl.col('Normal Storage (Acre-Ft)'),
        DRAINAGE_SQ_MILES = pl.col('Drainage Area (Sq Miles)'),
        SURF_AREA_SQM = pl.col('Surface Area (Acres)'),
        MAX_DISCHRG_CUB_FT_SEC = pl.col('Max Discharge (Cubic Ft/Second)'),

    )
    .filter(pl.col('DAM').is_not_null())
    .filter(pl.col('MAX_STG_ACR_FT').is_not_null())
    .with_columns(DAM = pl.col('DAM'))
    .collect()
)
state_list = sorted(df['STATE'].unique().to_list())

#----- CALLBACK FUNCTIONS ------------------------------------------------------
def get_state_stat(df, param, col):
    return(
        df
        .filter(pl.col('STATISTIC') == param)
        [col]
        [0]
    )

def get_scatter_map(state):
    df_state = (
        df
        .filter(pl.col('STATE') == state)
        .select('STATE', 'LONG', 'LAT', 'DECADE_COMP','MAX_STG_ACR_FT')
        .sort(['DECADE_COMP', 'MAX_STG_ACR_FT'])
    )
    state_zoom = 4  # default. following code changes zoom for listed states
    if state in ['Alaska']:
        state_zoom = 2
    elif state in [ 'Texas']:
         state_zoom = 3
    elif state in ['Connecticut', 'Louisiana', 'Massachusetts', 'New Jersey', ]:
        state_zoom = 5
    elif state in ['Delaware', 'Rhode Island','Puerto Rico']:
        state_zoom = 7
    elif state in ['Guam']:
        state_zoom = 8
    scatter_map = px.scatter_map(  # map libre
        df_state,
        lat='LAT',
        lon='LONG',
        zoom=state_zoom,
        color='DECADE_COMP',
    )
    scatter_map.update_layout(legend_title = '<b>Completed</b>')
    return(scatter_map)

def get_top_10_bar(state):
    df_state = (
        df
        .filter(pl.col('STATE') == state)
        .select(['DAM', 'MAX_STG_ACR_FT', 'MAX_DISCHRG_CUB_FT_SEC'])
        .sort('MAX_STG_ACR_FT', descending=True)
        .with_columns(  # split the dam name inot words, only keep 1st 5
            DAM_SHORT = pl.col('DAM').str.split(' ').list.slice(0, 7).list.join(' ')
        )
    )
    state_dam_count = df_state.height
    df_state = df_state.head(min(state_dam_count,15))
    fig = px.bar(
        df_state, 
        y='DAM_SHORT', 
        x='MAX_STG_ACR_FT',
        template='simple_white',
        title = f"DAMS ON {state.upper()}'S LARGEST WATERWAYS"
    )
    fig.update_layout(
        yaxis=dict(
            autorange='reversed',
            title='',
        ),
        xaxis=dict(
            title='MAXIMUM STORAGE [ACRE-FEET]'
        ),
    )
    return(fig)

def get_state_card_text(state):
    df_state = (
        df
        .filter(pl.col('STATE') == state)
        .select(['STATE'] + dam_stats)
    )
    dam_count=df_state.shape[0]
    df_state_stats = (
        df_state.group_by('STATE').agg(pl.col(dam_stats).sum())
        .transpose(
            include_header=True,
            header_name='STATISTIC',
        )
        .rename({'column_0': 'TOTAL'})
        .filter(pl.col('STATISTIC') != 'STATE')
        # cannot directly cast Int as to String. Cast to Float, then cast to Int
        .with_columns(pl.col('TOTAL').cast(pl.Float64).cast(pl.Int64))
        .with_columns(
            AVERAGE = (
                pl.col('TOTAL')/dam_count)
                .cast(pl.Int64)
            )
    )
     
    tot_max_acre_feet = get_state_stat(df_state_stats, 'MAX_STG_ACR_FT', 'TOTAL')
    avg_max_acre_feet = get_state_stat(df_state_stats, 'MAX_STG_ACR_FT', 'AVERAGE')
    title_text = (
       f"{state}'s {dam_count:,} dams " + 
       f'contain a total of {tot_max_acre_feet:,} acre-feet of water. ' + 
       f'Average containment per dam is {avg_max_acre_feet:,} acre-feet'
    ) 
    return title_text

def get_dam_table(state):
    df_state = (
        df
        .filter(pl.col('STATE') == state)
        .select(dam_table_cols)
        .sort('MAX_STG_ACR_FT', descending=True)
    )
    return df_state.to_dicts()

#----- DASH APPLICATION STRUCTURE-----------------------------------------------
app = Dash(external_stylesheets=[dbc.themes.LITERA])
app.layout =  dbc.Container(
    [
        html.Hr(style=style_space),
        html.H2(
            'USA DAM INFO', 
            style={'text-align': 'center', 'font-size': '32px'}
        ),
        html.H3('Mark Twain once said "There are lies, damned lies and statistics". These are dam statistics',
            style={'text-align': 'center', 'font-size': '16px', 'font-weight': 'normal'}
        ),
        html.H3('Data Source: National Inventory of Dams Website: https://nid.sec.usace.army.mil/#/ ', 
            style={'text-align': 'center', 'font-size': '16px', 'font-weight': 'normal'}
        ),
        html.Hr(style=style_space),

            dbc.Row(
                [
                    dbc.Col(html.Div('Select US State or Territory'), width=3),
                ]
            ),
        dbc.Row([       
            dbc.Col(
                [
                    dcc.Dropdown(
                        state_list,
                        state_list[0],
                        id='state_select', 
                        multi= False
                    ),
                ],
                width=2
            ),
        ]),

        html.Div(id='dd-output-container'),
        html.Div(id='dd-choropleth-container'),
        dbc.Row(
            [
                dbc.Col(
                    dcc.Graph(id='scatter_map'),
                    width=7
                ),
                dbc.Col(
                    dbc.Card(
                        [
                            dbc.CardBody(
                                [
                                    html.H4(
                                        'STATE STATS', 
                                        className='card-title',
                                        id='id-state-desc-title'
                                    ),
                                    html.P(
                                        "Some quick example text to build on the card title and "
                                        "make up the bulk of the card's content.",
                                        className="card-text",
                                        id='id-state-desc-text'
                                    ),
                                ],                   
                            ),
                        ]
                    )
                ),
            ]
        ),
        dbc.Row(
            [
                dbc.Col(
                    dcc.Graph(id='top_10_bar'),
                    width=7
                ),
                dbc.Col([grid],width=5),
            ]
        )
    ]
)

@app.callback(
    Output('scatter_map', 'figure'),
    Output('top_10_bar', 'figure'),
    Output('id-state-desc-title','children'),
    Output('id-state-desc-text','children'),
    Output('dam_table', 'rowData'),
    Input('state_select', 'value'),
)
def update_dashboard(selected_state):
    return (
        get_scatter_map(selected_state),
        get_top_10_bar(selected_state),
        selected_state.upper(),
        get_state_card_text(selected_state),
        get_dam_table(selected_state),
    )
if __name__ == '__main__':
    app.run_server(debug=True)
4 Likes

Love the Mark Twain reference, Mike.

Looks like many of the dams in California were built in the 50’s and 60’s.

I also see that the Shasta Dam has over 4 million acre-feet of waster storage. But what is acre-feet exactly? Can we convert/compare that to something more relatable like gallons of water or Olympic pools ?

1 Like

My first plotly chart, done here at the sprints at PyCon US. It turns out to be very easy to use with a polars dataframe.

# load the csv
df = pl.read_csv("data/nation-dams.csv", ignore_errors=True)

# mangle the data
chart_df = df.select(pl.col("Primary Purpose"), pl.col("NID ID")).group_by("Primary Purpose").agg(pl.count("NID ID"))

#create the bar chart
fig = px.bar(
    chart_df,
    x="Primary Purpose",
    y="NID ID",
    title="Primary Purpose of Dams in the US",
    labels={"NID ID": "Number of Dams"},
    width=800,
    height=400,
)

fig.show()  

6 Likes

Welcome to the Plotly community, @Bas_Bloemsaat :waving_hand: .

Awesome first Plotly graph. @Mike_Purtell we have another community member using Polars :slight_smile:

2 Likes

Alright everyone in the Figure Friday Week 20 community! :waving_hand:

This week, I jumped in early with my approach:

I want to show you this application, an interactive dashboard I put together so we can explore and compare dams in the United States that are a bit different from the rest – kind of like unique figures in a collection! To find them, I used a machine learning model called Isolation Forest. Imagine this model grabs the data and randomly separates it. The dams that are rare or distinct get isolated faster, like they’re easier to “catch” than the normal ones! So, the app shows them to us on an interactive map where we can filter by state and pick each dam to see its details.

With this tool, we can:

  • See where these “singular” dams are located on the map. They’re colored based on their potential hazard level and their size depends on how “rare” they are!
  • Filter the dams by state using a dropdown menu.
  • Click on a dam on the map to learn more about it: inspection dates, what it’s used for, its classification, etc.
  • Compare how a specific dam is with the average of others that are similar (with the same hazard level), using a radar-type chart!
  • Download the data we’ve filtered to analyze it more in-depth if we want.

What’s the point of finding these distinct dams? Well, they can give us interesting clues for further investigation. Maybe they’re errors in the data, or perhaps dams with very special operating conditions, or simply unusual characteristics that deserve our attention, even if they don’t mean they’re dangerous!

The main idea is that together we can identify and compare these dams that aren’t “typical.” Heads up! Being “singular” doesn’t always mean there’s a problem, but that they’re statistically different, and this can be useful for learning and, who knows, maybe finding interesting things that deserve a closer look!

I hope you like it and find it useful for continuing to learn together! :blush:


the code:

import dash
from dash import dcc, html, Input, Output, State, callback_context
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import dash_bootstrap_components as dbc

# Load data
df = pd.read_csv("represas_con_anomalias.csv")  # Replace with the correct path

# Filter only dams with anomalies
df_anomalias = df[df['Is_Anomaly'] == 1]

# Get unique states for dropdown
estados_unicos = sorted(df_anomalias['State'].unique())

# Define metrics to compare in radar chart
metricas = [
    'Dam Height (Ft)',
    'NID Height (Ft)',
    'Dam Length (Ft)',
    'NID Storage (Acre-Ft)',
    'Normal Storage (Acre-Ft)',
    'Surface Area (Acres)',
    'Drainage Area (Sq Miles)',
]
hazard_colors = {
    'High': '#440154',      # Morado oscuro
    'Significant': '#FDE725', # Amarillo brillante (para destacar un riesgo considerable)
    'Low': '#21918C',       # Verde azulado
    'Undetermined': '#90D7EC'  # Azul claro (para indicar incertidumbre)
}

# Initialize Dash application with Bootstrap
app = dash.Dash(
    __name__, 
    suppress_callback_exceptions=True,
    external_stylesheets=[dbc.themes.SANDSTONE]  # Modern Bootstrap theme
)

app.title = "US Dams Anomalies Dashboard"

# Dashboard layout using Bootstrap
app.layout = dbc.Container([
    dbc.Row([
        dbc.Col([
            html.H1("🏞️ Distinct US Dams: A Comparative View", 
                    className="text-center my-4 text-primary")
        ]),
    ]),
    
    # Card with general dataset information
    dbc.Row([
        dbc.Col([
            dbc.Card([
                dbc.CardBody([
                    html.H5(f"Total of Distintc US Dams: {len(df_anomalias)}", className="card-title"),
                    html.P(f"Shown here are US dams flagged for singularities using the Isolation Forest machine learning model, designed to isolate atypical data points for analysis, not safety assessment.", 
                           className="card-text"),
                ])
            ], className="mb-4 shadow border border-primary")
        ])
    ]),
   # State filter dropdown and info message as floating elements
    dbc.Row([
        # State filter dropdown as a stylish button
        dbc.Col([
            dbc.DropdownMenu(label="Filter by State: All States",
                             id="state-dropdown-button",
                             className="shadow-lg",
                             children=[
                                 dbc.DropdownMenuItem("All States", id="all-states", active=True),
                                 dbc.DropdownMenuItem(divider=True),
                                 *[dbc.DropdownMenuItem(state, id=f"state-{state}") for state in estados_unicos],
                             ],color="primary",
                             toggle_style={"font-weight": "bold", "border-radius": "10px", 
                                           "padding": "12px 20px",
                                           "box-shadow": "0 4px 6px rgba(50, 50, 93, 0.11), 0 1px 3px rgba(0, 0, 0, 0.08)"},
                             toggle_class_name="d-flex align-items-center"),
            # Hidden dropdown that stores the actual selected value
            dcc.Dropdown(id='state-dropdown',
                         options=[{'label': 'All States', 'value': 'all'},
                                  *[{'label': state, 'value': state} for state in estados_unicos]],
                         value='all',style={'display': 'none'}),], width=4, className="mb-3 d-flex align-items-center"),
        
        # Information message with improved styling
        dbc.Col([
            html.Div(
                html.P(
                    [html.I(className="fas fa-info-circle me-2"), 
                     "Click on any dam on the map to view detailed comparative analysis."],
                    className="text-muted fst-italic mb-0 px-3 py-2 rounded-pill bg-light shadow-sm"),
                className="d-flex justify-content-center align-items-center h-100"
            )
        ], width=8, className="mb-3"),
    ], className="mb-4"),
    
    # Two charts in the same row
    dbc.Row([
        # Left column for the map
        dbc.Col([
            dbc.Card([
                dbc.CardHeader(html.H4("🌐 Location of Singular Dams", className="text-center")),
                dbc.CardBody([
                    dcc.Graph(id='mapa-represas', style={'height': '70vh'}
                             ),
                ])
            ], className="shadow border border-primary"),
            html.Hr(), 
            # Agregar un botón de descarga para los datos filtrados
            html.Button("Download Filter data", id="btn-download"),
            dcc.Download(id="download-dataframe-csv"),
                    ], width=7),
        
        # Right column for information and radar chart
        dbc.Col([
            dbc.Card([
                dbc.CardHeader(html.H4("↔️  Comparing Distinct Features", className="text-center")),
                dbc.CardBody([
                    html.Div(id='info-represa-seleccionada', className="mb-3"),
                    dcc.Graph(id='radar-chart', style={'height': '50vh'}
                             ),
                ])
            ], className="shadow border border-primary")
        ], width=5),
    ], className="mb-4"),
    
    # Footer with additional information
    dbc.Row([
        dbc.Col([
            html.Div([
                    html.P([html.I(className="fas fa-info-circle me-2"), 
                        "US National Inventory of Dams Dashboard © 2025 source:Data is Plutal"],
                           className="text-muted fst-italic mb-0 px-3 py-2 rounded-pill bg-light shadow-sm")
                
            ],className="d-flex justify-content-center align-items-center h-100")
        ], className="mb-3")
    ]),
      
        # Stores to save state and dam selection
        dcc.Store(id='represa-seleccionada'),
        dcc.Store(id='estado-seleccionado', data='all'),
    
], fluid=True, className="px-4 py-3 bg-light")

# Callback to update the state-dropdown based on the button clicks
@app.callback(
    [Output('state-dropdown', 'value'),
     Output('estado-seleccionado', 'data'),
     Output('state-dropdown-button', 'label')],
    [Input('all-states', 'n_clicks')] +
    [Input(f'state-{state}', 'n_clicks') for state in estados_unicos],
    [State('estado-seleccionado', 'data')]
)
def actualizar_estado(*args):
    # Determine which item was clicked
    ctx = dash.callback_context
    
    if not ctx.triggered:
        # No clicks yet, return default
        return 'all', 'all', "Filter by State: All States"
    
    # Get the ID of the component that triggered the callback
    button_id = ctx.triggered[0]['prop_id'].split('.')[0]
    
    if button_id == 'all-states':
        return 'all', 'all', "Filter by State: All States"
    
    # Remove the 'state-' prefix to get the state name
    for state in estados_unicos:
        if button_id == f'state-{state}':
            return state, state, f"Filter by State: {state}"
    
    # If we get here, no specific button was matched
    return dash.no_update, dash.no_update, dash.no_update

# Callback to update the map based on selected state
@app.callback(
    Output('mapa-represas', 'figure'),
    [Input('estado-seleccionado', 'data'),
     Input('represa-seleccionada', 'data')]
)
def actualizar_mapa(estado_seleccionado, represa_seleccionada):
    # Filter by state if a specific state is selected
    if estado_seleccionado != 'all':
        datos_mapa = df_anomalias[df_anomalias['State'] == estado_seleccionado]
    else:
        datos_mapa = df_anomalias
    
    # Create scatter map
    fig = px.scatter_map(
        datos_mapa,
        lat='Latitude',
        lon='Longitude',
        hover_name='Dam Name',
        custom_data=['NID ID','County','Owner Names', 'Primary Owner Type', 'Year Completed'],
        color='Hazard Potential Classification',
        color_discrete_map=hazard_colors,
        size='Anomaly_Score',
        size_max=15,
        zoom=3,
        map_style='light',
        opacity=0.8,
    )
    fig.update_traces(hovertemplate="<b>%{hovertext}</b><br>" +
                                "NID ID: %{customdata[0]}<br>" +
                                "County: %{customdata[1]}<br>" +
                                "Owner: %{customdata[2]}<br>" +
                                "Owner Type: %{customdata[3]}<br>" +
                                "Completed: %{customdata[4]}<extra></extra>")
    
    # Highlight selected dam if there is one
    if represa_seleccionada:
        represa = df_anomalias[df_anomalias['NID ID'] == represa_seleccionada]
        if not represa.empty and (estado_seleccionado == 'all' or represa['State'].values[0] == estado_seleccionado):
            fig.add_trace(go.Scattermap(
                lat=[represa['Latitude'].values[0]],
                lon=[represa['Longitude'].values[0]],
                mode='markers',
                marker=dict(size=35, color='red', opacity=1),
                hoverinfo='none',
                showlegend=False
            ))
    
    # Adjust the center of the map based on the data
    if len(datos_mapa) > 0:
        lat_centro = datos_mapa['Latitude'].mean()
        lon_centro = datos_mapa['Longitude'].mean()
        zoom_level = 3 if estado_seleccionado == 'all' else 5
    else:
        lat_centro = 39  # Default center of US
        lon_centro = -98
        zoom_level = 3
    
    fig.update_layout(
        margin=dict(l=0, r=0, t=0, b=0),
        mapbox=dict(center=dict(lat=lat_centro, lon=lon_centro), zoom=zoom_level),
        clickmode='event+select',
        legend=dict(orientation="h",
                   yanchor="bottom",y=-0.20,xanchor="center",x=0.5)
    )
    
    return fig

# Callback to update selected dam when clicked on map
@app.callback(
    Output('represa-seleccionada', 'data'),
    [Input('mapa-represas', 'clickData')],
    [State('represa-seleccionada', 'data')]
)
def actualizar_represa_seleccionada(click_data, represa_actual):
    if click_data is None:
        return represa_actual
    
    # Get the NID ID of the clicked dam
    nid_id = click_data['points'][0]['customdata'][0]
    return nid_id

# Callback to display selected dam information
@app.callback(
    Output('info-represa-seleccionada', 'children'),
    [Input('represa-seleccionada', 'data')]
)
def mostrar_info_represa(represa_seleccionada):
    if not represa_seleccionada:
        return dbc.Alert(
            "Select a dam on the map to view its comparative analysis",
            color="info",
            className="text-center"
        )
    
    represa = df[df['NID ID'] == represa_seleccionada]
    if represa.empty:
        return dbc.Alert(
            "No information found for the selected dam",
            color="warning",
            className="text-center"
        )
    
    # Create a table with the dam's basic information using Bootstrap
    info = [
        html.H4(f"{represa['Dam Name'].values[0]}", className="text-center text-primary mb-3"),
        dbc.Table([
            html.Tbody([
                html.Tr([
                    html.Td("Last Inspection Date:", className="font-weight-bold"),
                    html.Td(represa['Last Inspection Date'].values[0])
                ]),
                html.Tr([
                    html.Td("Inspection Frequency:", className="font-weight-bold"),
                    html.Td(represa['Inspection Frequency'].values[0])
                ]),
                html.Tr([
                    html.Td("Primary Purpose:", className="font-weight-bold"),
                    html.Td(represa['Primary Purpose'].values[0])
                ]),
                html.Tr([
                    html.Td("Hazard Classification:", className="font-weight-bold"),
                    html.Td(represa['Hazard Potential Classification'].values[0])
                ]),
                html.Tr([
                    html.Td("Condition Assessment:", className="font-weight-bold"),
                    html.Td(represa['Condition Assessment'].values[0] if not pd.isna(represa['Condition Assessment'].values[0]) else "Not available")
                ]),
                html.Tr([
                    html.Td("Anomaly Score:", className="font-weight-bold"),
                    html.Td(
                        html.Span(
                            f"{represa['Anomaly_Score'].values[0]:.4f}",
                            className="badge bg-danger text-white p-2" if represa['Anomaly_Score'].values[0] > 0.7 
                            else "badge bg-warning text-dark p-2" if represa['Anomaly_Score'].values[0] > 0.4 
                            else "badge bg-success text-white p-2"
                        )
                    )
                ]),
            ])
        ], bordered=True, hover=True, size="sm", className="mb-0")
    ]
    
    return html.Div(info)

#Callback to update radar chart
@app.callback(
    Output('radar-chart', 'figure'),
    [Input('represa-seleccionada', 'data')]
)
def actualizar_radar_chart(represa_seleccionada):
    if not represa_seleccionada:
        # If no dam is selected, show empty chart
        fig = go.Figure()
        fig.update_layout(
            title="Select a dam to view comparison metrics",
            xaxis=dict(visible=False),
            yaxis=dict(visible=False),
            plot_bgcolor='rgba(0,0,0,0)',
            paper_bgcolor='rgba(0,0,0,0)',
            font=dict(color="#2C3E50")
        )
        return fig
    
    represa = df[df['NID ID'] == represa_seleccionada]
    if represa.empty:
        return go.Figure()
    
    # Get hazard category of selected dam
    categoria_peligro = represa['Hazard Potential Classification'].values[0]
    
    # Calculate average metrics for all dams in the same category
    df_categoria = df[df['Hazard Potential Classification'] == categoria_peligro]
    
    # Prepare data for radar chart
    datos_radar = []
    
    # Normalize values for each metric
    valores_represa = []
    valores_promedio = []
    etiquetas_metricas = []
    
    for metrica in metricas:
        # Create shorter label for the chart
        etiqueta_corta = metrica.replace(' (Ft)', '').replace(' (Acre-Ft)', '').replace(' (Acres)', '').replace(' (Sq Miles)', '')
        etiquetas_metricas.append(etiqueta_corta)
        
        # Value of selected dam (with NaN handling)
        valor_represa = represa[metrica].values[0] if not pd.isna(represa[metrica].values[0]) else 0
        
        # Average value of the category (with NaN handling)
        valor_promedio = df_categoria[metrica].mean() if not pd.isna(df_categoria[metrica].mean()) else 0
        
        # Add values to lists
        valores_represa.append(valor_represa)
        valores_promedio.append(valor_promedio)
    
    # Normalize values to be on a comparable scale
    max_valores = [max(a, b) for a, b in zip(valores_represa, valores_promedio)]
    max_valores = [val if val > 0 else 1 for val in max_valores]  # Avoid division by zero
    
    valores_represa_norm = [val / max_val for val, max_val in zip(valores_represa, max_valores)]
    valores_promedio_norm = [val / max_val for val, max_val in zip(valores_promedio, max_valores)]
    
    # Create radar chart with more attractive colors
    fig = go.Figure()
    
    fig.add_trace(go.Scatterpolar(
        r=valores_represa_norm,
        theta=etiquetas_metricas,
        fill='toself',
        name=f"{represa['Dam Name'].values[0]}",
        line=dict(color='#3498DB'),
        fillcolor='rgba(52, 152, 219, 0.3)'
    ))
    
    fig.add_trace(go.Scatterpolar(
        r=valores_promedio_norm,
        theta=etiquetas_metricas,
        fill='toself',
        name=f"Average - {categoria_peligro}",
        line=dict(color='#E74C3C'),
        fillcolor='rgba(231, 76, 60, 0.3)'
    ))
    
    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 1],
                showticklabels=False,
                showline=False,
                ticks='',
                gridcolor='#95a5a6'
            ),
            angularaxis=dict(
                gridcolor='#95a5a6'
            ),
            bgcolor='rgba(255, 255, 255, 0.9)'
        ),
        font=dict(color="#2C3E50"),
        paper_bgcolor='rgba(0,0,0,0)',
        plot_bgcolor='rgba(0,0,0,0)',
        showlegend=True,
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=-0.2,
            xanchor="center",
            x=0.5
        )
    )
    
    return fig

# Y el callback correspondiente
@app.callback(
    Output("download-dataframe-csv", "data"),
    Input("btn-download", "n_clicks"),
    State("estado-seleccionado", "data"),
    prevent_initial_call=True,
)
def func(n_clicks, estado_seleccionado):
    if estado_seleccionado != 'all':
        df_filtrado = df_anomalias[df_anomalias['State'] == estado_seleccionado]
    else:
        df_filtrado = df_anomalias
    return dcc.send_data_frame(df_filtrado.to_csv, "represas_filtradas.csv")


if __name__ == '__main__':
    app.run_server(debug=True, jupyter_mode='external')

the app:

4 Likes

Somehow I was triggered by the Hazard Potential for all the wrong reasons. I thought it had a relation with the state of the dam, but no, it’s a classification of the impact if something would go wrong with the dam.

EAP is an Emergency Action Plan. When a dam has a hazard classification “High”, an EAP is required, when “Significant”, very recommendable, “Low” not, “Undetermined” => nobody knows.

Regulations can differ by State etc. but it’s likely that if the dam owner is federal, state etc, they have done what they are required to do. I was interested numbers, percentages of eap’s, owner and purpose.

Edit 20/05:

  • known bugs, numbers not completely visible
  • the red shapes on the overview are not calculated, which means if you disable a trace, it makes no sense
  • hover etc, skipped.
  • no info button, an explanation about owners, hazard classes and eap’s is in a document on py.cafe, uitleg.odt, that part was a chat with AI.
    Nice to haves:
  • if you drill down on hazard class high and click in the drilldown on owner private, no eap, a map with the locations.

7 Likes

Hey all,

First time working with Dash and choropleth. Made a few quick graphs to exhibit count and hazard percentages by state. Would love if someone could fix the bug so that when I hover over the states in my hazard graph, it displays a percentage instead of a float. Appreciate you all.

Code:

from dash import Dash, html, dash_table, dcc, callback, Output, Input
import pandas as pd
import plotly.express as px
# import dash_ag_grid as dag

# Reading in the dataset
df = pd.read_csv(r"C:\Users\ETROJH2\OneDrive - M&T Bank\Credit Misc\Plotly\nation.csv", header = 1)

states = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR',
    'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT',
    'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI',
    'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA',
    'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME',
    'Maryland': 'MD', 'Massachusetts': 'MA', 'Michigan': 'MI',
    'Minnesota': 'MN', 'Mississippi': 'MS', 'Missouri': 'MO',
    'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV',
    'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM',
    'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND',
    'Ohio': 'OH', 'Oklahoma': 'OK', 'Oregon': 'OR',
    'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC',
    'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT',
    'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA',
    'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY'
}

def state_to_abbreviation(state_name):
    """Converts a state name to its abbreviation.

    Args:
        state_name: The full name of the state (case-insensitive).

    Returns:
        The two-letter abbreviation of the state, or None if not found.
    """
    return states.get(state_name.title())

# Creating a column that flags dams that are high or significant hazard - marking all as 0
df['Any Hazard'] = 0
df['High Hazard'] = 0
df['Significant Hazard'] = 0

# Locating high and significant hazards
hazard_or_significant = df.loc[(df['Hazard Potential Classification'] == "High") |
               (df['Hazard Potential Classification'] == "Significant")].index
hazard = df.loc[(df['Hazard Potential Classification'] == "High")].index
significant = df.loc[(df['Hazard Potential Classification'] == "Significant")].index

# Marking high or signifcant hazards
df.loc[hazard_or_significant, "Any Hazard"] = 1
df.loc[hazard, "High Hazard"] = 1
df.loc[significant, "Significant Hazard"] = 1

# Creating an aggregate dataset that contains counts for each state
state_counts = df.groupby("State").agg(count=("State", "count"),
                                      hazard = ("Any Hazard", "sum"),
                                      high_hazard = ("High Hazard", "sum"),
                                      sig_hazard = ("Significant Hazard", "sum")).reset_index()
# Running abbreviation function for each state
state_counts['state_ab'] = state_counts['State'].apply(state_to_abbreviation)

# Creating a percentage for each type of hazard
state_counts['any_hazard_percentage'] = state_counts['hazard'] / state_counts['count']
state_counts['high_hazard_percentage'] = state_counts['high_hazard'] / state_counts['count']
state_counts['sig_hazard_percentage'] = state_counts['sig_hazard'] / state_counts['count']


app = Dash()

# Creating the Dash Layout
app.layout = html.Div([
    html.Div(children="Dam Graph Analysis"),
    dcc.RadioItems(options=["any_hazard_percentage", "high_hazard_percentage", "sig_hazard_percentage"],
                   value="any_hazard_percentage",
                   id="hazard_options"),
    dcc.Graph(id="state_graph"),
    dcc.Graph(id="hazard_graph")
])

@callback(
    Output(component_id="state_graph", component_property="figure"),
    Output(component_id="hazard_graph", component_property="figure"),
    Input(component_id="hazard_options", component_property="value")
)

def update_graph(hazard_chosen):
    # Making a figure for the count of the number of dams in each state
    fig_counts = px.choropleth(state_counts,
                               locations="state_ab",
                               locationmode="USA-states",
                               color="count",
                               scope="usa",
                               color_continuous_scale="Viridis")

    fig_counts.update_layout(title="Dam Counts by US State")

    # Making a figure for the hazard percentages by state
    fig_percentages = px.choropleth(state_counts,
                                    locations="state_ab",
                                    locationmode="USA-states",
                                    color=hazard_chosen,
                                    scope="usa",
                                    color_continuous_scale="Viridis")

    fig_percentages.update_layout(title="Hazard Percentages by State")

    return fig_counts, fig_percentages


if __name__ == '__main__':
    app.run(debug=True)
5 Likes

Welcome to Figure Friday @Bas_Bloemsaat, I am so happy to have another polars user in this forum. I used pandas for about 5 years before making the switch and so happy I did that. I don’t miss the syntax or the crazy indexing of pandas. How about you?

Hello @marieanne , your work is compelling as always, especially the emergency action plan charts.

Wondering what the file size limit is on pycafe? Might be able to get under the maximum file size by prefiltering unwanted rows and columns, changing numeric columns to smaller # of bytes (Float64 to Float16), and using categorical types for string columns. I am still looking forward to the day when I can deploy my polars based code to pycafe.

1 Like

Hi @Avacsiglo21, great dashboard. I liked how clicking on the scatter point brings up the radar chart using data for that specific point. Brilliant. Also, the download filter data looks very useful. I did not know that is even possible.

1 Like

Hi @adamschroeder,

The population of California was 10.5M in 1950, 15M in 1960, and is now almost 40M. Many took notice in 1962 when California’s population exceeded New York’s, making it the most populous state since. That same year in my local area, the San Luis reservoir opened with a groundbreaking ceremony led by President Kennedy.

With our semi-arid climate, near zero rainfall from May to Sept, and growing population it is easy to see why there was a reservoir construction boom in that era. Our water supplies are closely watched by tracking these reservoir levels, and snowpacks in the mountains.

Acre-feet is the most common unit for storing large amounts of water. One acre-foot can fill 2 Olympic swimming pools.

2 Likes

I cleaned it to 5 columns, it’s now on py.cafe.

3 Likes

Hi @jhall3 , welcome to Figure Friday.

I can help you arrange the hover info to show percentages instead of float values on the hazard graph. The solution is to use a custom hover template and may require adding a custom_data parameter to px.choropleth. I noticed that fig_counts and fig_percentages are created with almost identical code, where the only difference is in the color parameter. Is that your intention?

1 Like

Hi jhall to complement Mike’s point, take a look the code I share I did it what Mike said, use the custom_data parámeter.

Hope it helps you

1 Like

Hi Marianne, I initially had the same impression about the Hazard Potential, but I realized it was incorrect. I then used it as a categorical feature for the Isolation Forest ML.

Hello everybody! This is my first time using Dash and I wanted to participate.
For my example, I wanted to create a simple dashboard with the option to filter based on a specific state and remove any column unneeded from the table, but also be able to export the filtered data on a usable file.
Below is my board and my code.
I would appreciate any feedback to improve my coding!

from dash import Dash, html, dcc, callback, Output, Input, State, ctx, dash_table
import pandas as pd
import plotly.express as px
import dash_ag_grid as dag
import io
from xlsxwriter import Workbook


# Load data
ColumnsToUse = ['City', 'County', 'State', 'Primary Owner Type', 'Dam Name', 'Owner Names', 'Primary Purpose']
df = pd.read_csv(r"YourDataPath.csv", header=1, usecols=ColumnsToUse)

# Unique states list including "All"
states_unq = ['All'] + sorted(df['State'].dropna().unique().tolist())

# App
app = Dash()
app.title = "Dams Dashboard"

app.layout = html.Div([
    html.H1(id='Main_Title', children='My First App with Data, Graph, and Controls'),

    dcc.Input(id='UserTitle', value='My First Dash!', type='text'),

    html.Div([
        html.Label("Select State:"),
        dcc.RadioItems(
            options=[{'label': state, 'value': state} for state in states_unq],
            value='All',
            id='state-selector',
            labelStyle={'display': 'inline-block', 'margin-right': '10px'}
        )
    ], style={'margin': '10px 0'}),

    html.Div([
        html.Label("Remove Columns:"),
        dcc.Dropdown(
            id='column-remover',
            options=[{'label': col, 'value': col} for col in ColumnsToUse],
            multi=True,
            placeholder="Select columns to hide"
        )
    ], style={'margin': '10px 0', 'width': '50%'}),

    html.Button("Export Data", id="export-button", n_clicks=0),
    dcc.Download(id="download-data"),

    html.Div([
        dag.AgGrid(
            id="grid",
            rowData=df.to_dict("records"),
            columnDefs=[{"field": i} for i in df.columns],
            style={"height": "300px", "width": "90%", "margin": "auto"},
        )
    ], style={'margin-top': '20px'}),

    dcc.Graph(id='graph1')
])


@callback(
    Output('Main_Title', 'children'),
    Output('graph1', 'figure'),
    Output('grid', 'rowData'),
    Output('grid', 'columnDefs'),
    Input('UserTitle', 'value'),
    Input('state-selector', 'value'),
    Input('column-remover', 'value')
)
def update_output(user_title, selected_state, removed_cols):

    filtered_df = df if selected_state == 'All' else df[df['State'] == selected_state]

    visible_cols = [col for col in df.columns if not removed_cols or col not in removed_cols]

    graph_df = filtered_df if 'Primary Owner Type' in visible_cols else pd.DataFrame(columns=['Primary Owner Type'])
    fig = px.histogram(
        graph_df,
        x='Primary Owner Type' if 'Primary Owner Type' in visible_cols else None,
        color='Primary Owner Type' if 'Primary Owner Type' in visible_cols else None,
        title=f"Dams in {selected_state} by Primary Owner Type" if selected_state != 'All' else "Dams in the US by Primary Owner Type"
    ) if 'Primary Owner Type' in visible_cols else px.histogram(title="No graph - 'Primary Owner Type' is hidden")

    return (
        user_title,
        fig,
        filtered_df[visible_cols].to_dict("records"),
        [{"field": i} for i in visible_cols]
    )


@callback(
    Output("download-data", "data"),
    Input("export-button", "n_clicks"),
    State("state-selector", "value"),
    State("column-remover", "value"),
    prevent_initial_call=True
)
def export_filtered_data(n_clicks, selected_state, removed_cols):
    filtered_df = df if selected_state == 'All' else df[df['State'] == selected_state]
    visible_cols = [col for col in df.columns if not removed_cols or col not in removed_cols]

    export_df = filtered_df[visible_cols]

    # Default export format: Excel
    output = io.BytesIO()
    with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
        export_df.to_excel(writer, index=False, sheet_name='Data')
    output.seek(0)

    return dcc.send_bytes(output.read(), "filtered_data.xlsx")

# Run
if __name__ == '__main__':
    app.run(debug=True)

5 Likes

First time using Plotly and Dash here at PyCon2025. Built a simple app that allows the user to select the x and y axis of a bar chat, returning a count of selected.

from dash import Dash, html, dash_table, dcc, callback, Input, Output
import pandas as pd
import plotly.express as px

columns_to_read = ['Primary Owner Type', 'Primary Purpose','State','State Regulated Dam', 'Federally Regulated Dam','Primary Dam Type']
df= pd.read_csv('nation.csv',skiprows=1, usecols=columns_to_read)

app = Dash()

app.layout = [
    html.Div(children='First Dash App with Data'),
    html.Hr(),
    dash_table.DataTable(data=df.to_dict('records'),page_size=10),
    html.Div(className='row',children=[
        dcc.Dropdown(df.columns, id='inputx', value='State'),
        dcc.Dropdown(df.columns, id='inputy',value='Primary Purpose')
    ]),
    
    
    dcc.Graph(id='graph1')
    ]

@callback(
    Output(component_id='graph1', component_property='figure'),
    Input(component_id='inputx', component_property='value'),
    Input(component_id='inputy', component_property='value')
)
def update_graph(inputx, inputy):
    df_summarized = df.groupby([inputx,inputy]).size().reset_index(name='Count')
    fig = px.bar(df_summarized, x=inputx, y=df_summarized['Count'], barmode='relative', color=inputy)
    return fig

@callback(
    Output(component_id='inputy', component_property='options'),
    Input(component_id='inputx', component_property='value')
)
def update_options(inputx):
    options = [col for col in df.columns if col != inputx]
    return options

@callback(
    Output(component_id='inputx', component_property='options'),
    Input(component_id='inputy', component_property='value')
)
def update_options(inputy):
    options = [col for col in df.columns if col != inputy]
    return options
if __name__ == '__main__':
    app.run(debug=True) 

8 Likes

Hi Rubén, nice job, even if this Is your first app. Let me ask you, you want to change the radio items for a dropdown menu to have a More clean dashboard. Of course it just an oppinion

2 Likes

Hi Alexander, I think that’s a great Idea! The cleaner the dashboard the better the insights will be. Thanks!

1 Like