Figure Friday 2025 - week 24

Hi Ester, sorry I did not understand your questions? Did you Solve it?

1 Like

Thanks, I decided to remove the unknowns instead. I also changed the photo.

2 Likes

Let’s say you are a law student, and you need to learn about all the violations in this data set, to understand the likely outcomes based on many factors. Or you are someone who likes to break the rules and has a limited budget, so you need to find the most cost-effective rules to break to satisfy your inner demons. If so, this dashboard is probably not for you.

I wrote this dashboard from the point of view of a person who has been issued a parking citation and just wants to know about their specific violation and nothing else.

But the real reason I created this dashboard is to continue improving basic skills with tables, callbacks, graphs etc. The grid on the upper right to select a violation is a single-column dash ag table, using cellClicked as a callback input. I could have used a pull-down menu or radio buttons, but in cases where selection is 1 of many items the table seems like the best way. I would be interested to know if there is a better component for selecting from a long list? This list has only 8 items, but in many cases I work with much longer list and want the best solution for picking a value.

Something I learned and did not expect is that the amounts paid are much higher than the original fines. This is due to late penalties and things like that. But I wrongly expected amounts paid to be less as many cases end with reduced penalties.

Here is the code:

import polars as pl
import plotly.express as px
import dash
from dash import Dash, dcc, html, Input, Output
import dash_mantine_components as dmc
from dash_ag_grid import AgGrid
dash._dash_renderer._set_react_version('18.2.0')

#----- DATA GATHER AND CLEAN ---------------------------------------------------
df = (
    pl.read_csv(
        'Open_Parking_and_Camera_Violations.csv',
        try_parse_dates=True,
        ignore_errors=True
    )
    .select( 
        ISSUE_DATE = pl.col('Issue Date'),
        VIOLATION=pl.col('Violation'),
        JUDGE_DATE = pl.col('Judgment Entry Date'),
        FINE_AMT = pl.col('Fine Amount'),
        PAY_AMT = pl.col('Payment Amount'),
        STATUS = pl.col('Violation Status'),
    )
    .with_columns(
        JUDGE_DAYS = (
            pl.col('JUDGE_DATE') - pl.col('ISSUE_DATE'))
            .dt.total_days()
    )
    .filter(pl.col('JUDGE_DAYS') > 0)  
)

#----- GLOBALS -----------------------------------------------------------------
style_horiz_line = {'border': 'none', 'height': '4px', 
    'background': 'linear-gradient(to right, #007bff, #ff7b00)', 
    'margin': '10px,', 'fontsize': 32}

style_h2 = {'text-align': 'center', 'font-size': '32px', 
            'fontFamily': 'Arial','font-weight': 'bold'}
style_h3 = {'text-align': 'center', 'font-size': '24px', 
            'fontFamily': 'Arial','font-weight': 'normal'}

violation_list = df.unique('VIOLATION').select(pl.col('VIOLATION')).to_series().to_list()
print(f'{len(violation_list) = }')
print(f'{violation_list = }')

#----- CALLBACK FUNCTIONS-------------------------------------------------------
def get_px_hist(df, violation, data_col):
    if data_col == 'FINE_AMT':
        graph_title = 'DISTRIBUTION OF FINES ASSESSED'
    if data_col == 'PAY_AMT':
        graph_title = 'DISTRIBUTION OF AMOUNTS PAID'
    if data_col == 'JUDGE_DAYS':
        graph_title = 'DAYS BETWEEN VIOLATION AND JUDGEMENT'
    fig = px.histogram(
        df.filter(pl.col('VIOLATION') == violation),
        x=data_col,
        template='simple_white',
        title=graph_title
    )
    return fig
 
def make_violation_table():
    df_dag = (
        df
        .select('VIOLATION')
        .unique('VIOLATION')
        .sort('VIOLATION')
        .rename({'VIOLATION': 'SELECT A VIOLATION:'})
    )
    row_data = df_dag.to_dicts()
    column_defs = [{"headerName": col, "field": col} for col in df_dag.columns]
    return (
        AgGrid(
            id='violation_list',
            rowData=row_data,
            columnDefs=column_defs,
            defaultColDef={"filter": True},
            columnSize="sizeToFit",
            getRowId='params.data.State',
            dashGridOptions={
                'rowSelection': 'single',
                'animateRows': True
            },
        )
    )

#----- DASH APPLICATION STRUCTURE-----------------------------------------------
app = Dash()
app.layout =  dmc.MantineProvider([
    dmc.Space(h=30),
    html.Hr(style=style_horiz_line),
    dmc.Text('New York City Traffic Violation Data', ta='center', style=style_h2),
    dmc.Text('', ta='center', style=style_h3, id='violation_text'),
    html.Hr(style=style_horiz_line),
    dmc.Space(h=30),
    dmc.Grid(  
        children = [ 
            dmc.GridCol(dcc.Graph(id='px_hist_fine'), span=4, offset=1),
            dmc.GridCol(make_violation_table(), span=4, offset=2)
        ]
    ),
    dmc.Grid(  
        children = [ 
            dmc.GridCol(dcc.Graph(id='px_hist_paid'), span=4, offset=1),
            dmc.GridCol(dcc.Graph(id='px_hist_period'), span=4, offset=1),
        ]
    ),
])

@app.callback(
    Output('px_hist_fine', 'figure'),
    Output('px_hist_paid', 'figure'),
    Output('px_hist_period', 'figure'),
    Output('violation_text', 'children'),
    Input('violation_list', 'cellClicked'),
)
def update_dashboard(violation):  # line_shape, scale, test, dag_test):
    if violation is None:
        violation_name=violation_list[0]
    else:
        violation_name = violation["value"]
    print(f'{violation_name = }')
    px_hist_fine = get_px_hist(df, violation_name, 'FINE_AMT')
    px_hist_paid = get_px_hist(df, violation_name, 'PAY_AMT')
    px_hist_period = get_px_hist(df, violation_name, 'JUDGE_DAYS')
    return px_hist_fine, px_hist_paid, px_hist_period, violation_name

if __name__ == '__main__':
    app.run(debug=True)

4 Likes

@Mike_Purtell I like this unique approach of using the table instead of checkboxes. With the rows selected being highlighted in blue, it’s makes it clear what violations the user selected. One small suggestion I would offer is to center the column header. Right now, the header almost looks like it’s part of the violation options to choose from.

One thing to keep in mind is that in certain cases it would be much easier to work with the checkboxes than the table. For example, if the data was such that selecting a certain combination of violations would render an empty graph, you would want to disable certain violation options from being selected. In other words, you would want to have the flexibility to disable certain violation combinations that would show an empty dashboard. In that case, the DMC checkbox (see the States section) would work great.

@adamschroeder I uploaded my app to pycafe, it has a hard time saving sometimes, but it’s fine now.

1 Like

A little late to party but I still wanted to make a submission. I noticed the dataset was missing a few records and, after downloading the data directly from NYC Open Data portal, I see why. There were over 16 million violations issued in 2023 across nearly 100 violation codes. It took about an hour to download the entire dataset… That’s a hefty meatball!

I built a dashboard to explore each violation type and identify patterns in the ticket times, payment trends, and how often people contested their fines.

A few interesting observations:

  • Total payments across all violations exceeded $1 billion, with $254 million still outstanding
  • The most common violation was speeding in a school zone (code #36) with ~6 million fines issued and just shy of $300 million paid
  • The most successfully contested violation was #69 - Failing to show a parking meter receipt, with over 50% resulting in a reduction or dismissal
  • One of the largest unpaid balances comes from #66 - Parking a trailer or semi-trailer with just $243K paid out of $1.1 million owed

App and code are provided below…
pycafe
github

4 Likes

Very nice, @spd . I really like your app, especially the informative heatmap.

Good usage of DMC as well.
I was wondering if someone was going to download the full dataset :smile:
How did you make the data fit on py.cafe? I see the json file there, but that doesn’t represent the whole dataset, correct?

1 Like

Thanks! Yeah, the heatmap is my favourite part :slight_smile:

Once I figured out what I wanted to display, I pre-aggregated the data for each violation type.

I started with a list of violations along with their code #s, definitions, and typical fine amounts (I stored this in a JSON file). This was converted to a dictionary with the key as the violation name/description (as it is displayed in the data) and the value as a dataclass. I then looped through each file I exported from the Open Data API: loading it as a pandas DataFrame, looping through each violation found in the data, and then updating the corresponding violation’s dataclass.

Here’s a sample of the code to give you a better idea:

with open(DATA_DIR / 'nyc_parking_violation_codes.json', 'r', encoding='utf-8') as fp:
    violation_details = json.load(fp)

violations = {v['description']: Violation.from_dict(v) for v in violation_details}

for month in range(1, 13):
    df_m = pd.read_parquet(DATA_DIR / f"nc67-uf89_month_2023-{month:0>2}_v2.parquet")

    counts = df_m['violation'].value_counts()
    amounts = df_m.groupby('violation')[[col for col in df_m.columns if col.find('amount')!= -1]].sum().round(0)

    # Update Violation objects
    for v_key in df_m['violation'].unique():
        if v_key not in violations.keys():
            continue

        v = violations[v_key]

        v.total_count += counts.loc[v_key]
        v.total_fine += amounts.loc[v_key].get('fine_amount').item()
        v.total_penalty += amounts.loc[v_key].get('penalty_amount').item()
        v.total_interest += amounts.loc[v_key].get('interest_amount').item()
        v.total_reduction += amounts.loc[v_key].get('reduction_amount').item()
        v.total_payment += amounts.loc[v_key].get('payment_amount').item()
        v.total_due += amounts.loc[v_key].get('amount_due').item()
2 Likes