Figure Friday 2025 - week 22

join the Figure Friday session on June 6, at noon Eastern Time, to showcase your creation and receive feedback from the community.

What are the best performing Marvel movies? Is there a difference between their international and domestic gross earnings?

Answer these questions and a few others by using Plotly and Dash on the Marvel movies dataset.

Things to consider:

  • what can you improve in the app or sample figure below (scatter plot)?
  • would you like to tell a different data story using a different graph?
  • can you create a different Dash app?

Sample figure:

Code for sample figure:
from dash import Dash, dcc
import dash_ag_grid as dag
import plotly.express as px
import pandas as pd


df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-22/Marvel-Movies.csv")

# group the dataset by the category column and get the average 'budget' and 'worldwide gross'
df_grouped = df.groupby('category').agg({'budget': 'mean', 'worldwide gross': 'mean'}).reset_index()

fig = px.scatter(df_grouped, x='budget', y='worldwide gross', hover_data=['category'])
fig.update_traces(marker_size=20)

grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[{"field": i, 'filter': True, 'sortable': True} for i in df.columns],
    dashGridOptions={"pagination": False},
    columnSize="sizeToFit"
)

app = Dash()
app.layout = [
    grid,
    dcc.Graph(figure=fig)
]


if __name__ == "__main__":
    app.run(debug=False)

Participation Instructions:

  • Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
  • Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
  • Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

:point_right: If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to the Information is Beautiful for the data.

2 Likes

Hello Plotly Community! My contribution for this Week 22 Figure Friday is a Marvel Movies interactive dashboard. This time, I didn’t reinvent the wheel; I leveraged the same format I used for a recent Pixar Films challenge I participated in.

To group the Marvel movies, I used PCA (Dimensionality Reduction) techniques, and then K-means Clustering to identify the clusters. I employed a couple of methods like the Elbow method and the Silhouette score to refine the clustering.

Key Features
Film Groupings Overview: Explore different categories (clusters) of Marvel films, understanding their average performance in terms of gross, critic scores, audience scores, and identifying top-performing films within each group.
Box Office Trends: Visualize how worldwide, domestic, and international box office grosses have evolved over the years for Marvel films, with insights into how different film clusters contribute to these trends.
Performance Metrics: Dive into specific performance indicators such as overall box office, budget recovery rates, profit margins, and opening weekend earnings for individual movies.
Movie Timeline: See a chronological overview of Marvel movie releases, showing how various metrics like worldwide gross and critic scores change over time.
Marvel Trivia: Get a random fun fact about the Marvel Cinematic Universe with a simple click.

Any comments/suggestions is highly apprecciated

Link to the project

(PyCafe - Dash - marvel_films)


5 Likes


:clapper_board: Marvel at the Numbers
The Marvel Cinematic Universe isn’t just a collection of blockbuster films—it’s a global phenomenon that has reshaped modern cinema.
Here’s a summary of the dashboard titled "Marvel Movies: Key Insights

  1. Production Budgets (Bar Chart):
  • Most expensive film: Avengers: Endgame, with a budget close to $400M.
  • Other high-budget films include Infinity War, Civil War, and Love and Thunder.
  • Lowest budgets were for early movies like Ant-Man and The Incredible Hulk.
  1. Domestic vs. International Gross (Donut Chart):
  • International gross dominates with 63.4%.
  • Domestic gross accounts for 36.6%.
  1. Critics vs. Audience Score (Bubble Chart):
  • Most movies cluster around 70-90% for both scores.
  • A positive correlation exists between critics and audience ratings.
  • Larger bubbles (indicating higher gross) generally appear in the high-score zone.
  1. Trends Over Time (Line Chart):
  • Worldwide gross (red line) shows significant peaks around major Avengers films (especially 2012–2019).
  • Budgets (yellow line) remain relatively steady but increase slightly over time.

:file_folder: Filters

  • Two dropdowns available: Category and Film to narrow down data views.
7 Likes

Hi everyone!

Here’s my interactive Marvel Movies dashboard. It shows top‐performing films and compares domestic vs. international grosses in a dark‐themed interface.

I use Random Forest feature importance for scales numeric columns, one‐hot encodes categories, fits a RandomForestRegressor to predict worldwide gross, and lists the top 10 predictors.

I have two tabs in my dashboard:

  • Visualizations: year slider, budget slider, category dropdown → scatter (domestic vs. international), bar (top 10 worldwide), correlation heatmap, and box plot.
  • Insights: top 5 films by global gross, average domestic vs. international gross, per‐film difference, and a bar chart of the top 10 feature importances.

Live demo: PyCafe - Dash - Dash Interactive Color Picker
Code: GitHub - Feanor1992/Marvel-Movies-Dataset-Analysis

Looking forward to your feedback and visualizations!

6 Likes

hi @Avacsiglo21
Marvelous app on the marvel movies :slight_smile:

Thank you for adding your code, but it wasn’t added correctly.

You need to put everything inside backticks like this:

But since there is so much code, you can also just link to the project on Py.cafe. People can see the code there.

By the way, what theme did you use to get this cool styling?

2 Likes

Hi Adams,

I’ve made the corrections. It’s not a huge amount of code—around 550 lines. I’ve shared projects with much more code in the past. This is my first time using the ‘hide details’ feature. The theme/style is Sketchy.

:muscle:

2 Likes

JIT addition.

I used @PipInstallPython claude.md document with UI/UX/Dash Mantine guidelines as input for Claude @openrouter.ai.
This is the result of the second attempt.

My idea was to use the column %budget covered to see if a movie would cover all costs, using the idea that a movie needs to generate 2 to 3 times it’s budget before it starts to make profit. Hence the two horizontal lines at 200 a 300%.

Input + prompt

import pandas as pd

marvel = pd.read_csv(‘https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-22/Marvel-Movies.csv’)
marvel = marvel.drop(columns=[‘source’,‘domestic gross ($m)’, ‘international gross ($m)’ ])
#clean data and assign correct dtype
marvel[‘% budget recovered’] = marvel[‘% budget recovered’].str.replace(‘%’, ‘’).astype(float)
marvel[‘critics % score’] = marvel[‘critics % score’].str.replace(‘%’, ‘’).astype(int)
marvel[‘audience % score’] = marvel[‘audience % score’].str.replace(‘%’, ‘’).astype(int)
marvel[‘audience vs critics % deviance’] = marvel[‘audience vs critics % deviance’].str.replace(‘%’, ‘’).astype(int)
marvel[‘1st vs 2nd weekend drop off’] = marvel[‘1st vs 2nd weekend drop off’].str.replace(‘%’, ‘’).astype(int)
marvel[‘% budget opening weekend’] = marvel[‘% budget opening weekend’].str.replace(‘%’, ‘’).astype(float)

#the data loaded is information about marvel movies.

#assignment:
#create a dash app, use only dash mantine components.
#create the app based on the uploaded best practices for UI/UX/Mantine.
#add a light/dark mode switch with callback, make dark mode default
#create a filter dropdown for category, add “all categories” and make “all categories” default
#on appload, load the plots for “all categories”

#create a scatterplot

x = year, y=“% budget recovered”

colors for traces: red if y < 200, blue if y > 250 else purple

create a custom hover with film, category, worldwidegross and budget

create a callback which makes it possible to filter on category

Modifications after 1 time returning with an error on the definition of children:

  • Claude added a horizontal line at 100% called Break Even, that’s at least dubious, there are enough sources on the web which state that you need at least 2 to 3 times the budget to cover all costs of a movie. Maybe I should have added this idea to the prompt.
  • an error in the column names for the hover.
  • other dotcolors, to make the contrast higher, my mistake.

I did not correct errors in colors for light/dark mode or a disappearing legend when selecting a category.

I was not very happy with the result but it’s mantine and responsive.

And a costly mistake, until yesterday it took me almost 2 months to spend 5 dollars at openrouter, 2 attempts with this configuration and some interactions were equally expensive. I should have dumped the .md into another cheaper coding model and see if the output would be good enough. But as a starter, this output is ok.

4 Likes

@marieanne I think the issue was with your prompt, when I copied your prompt directly I received similar results. However when I changed the prompt this was my result with claude 4 opus using the claude.md document with UI/UX/Dash Mantine guidelines.

prompt I used

What are the best performing Marvel movies? Is there a difference between their international and domestic gross earnings? Answer these questions and a few others by using Plotly and Dash on the Marvel movies dataset. Things to consider: * what can you improve in the app or sample figure below (scatter plot)? * would you like to tell a different data story using a different graph? * can you create a different Dash app? Sample figure: image1912×903 41.7 KB

Code for sample figure:

from dash import Dash, dcc
import dash_ag_grid as dag
import plotly.express as px
import pandas as pd


df = pd.read_csv("https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2025/week-22/Marvel-Movies.csv")

# group the dataset by the category column and get the average 'budget' and 'worldwide gross'
df_grouped = df.groupby('category').agg({'budget': 'mean', 'worldwide gross': 'mean'}).reset_index()

fig = px.scatter(df_grouped, x='budget', y='worldwide gross', hover_data=['category'])
fig.update_traces(marker_size=20)

grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[{"field": i, 'filter': True, 'sortable': True} for i in df.columns],
    dashGridOptions={"pagination": False},
    columnSize="sizeToFit"
)

app = Dash()
app.layout = [
    grid,
    dcc.Graph(figure=fig)
]


if __name__ == "__main__":
    app.run(debug=False)

Participation Instructions: * Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash. * Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread. * Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community. If you prefer to collaborate with others on Discord, join the Plotly Discord channel. Data Source: Thank you to the Information is Beautiful for the data.

Based on ui ux best practices as seen in the claude.md file, can you use dash mantine components graphs designing the application for both mobile and desktop to support light / dark mode.

Create the dashboard to be clean and interactive and display the information of the MarvelMovies.csv in an engaging way

Practically just copied Adams origional message and instructed claude to follow the best ui ux best practices, use dmc and dmc graphs and make it engaging with light/dark mode and this was my result:


Only two issues I faced was it used an icon prop in one of the dmc.Text which doesn’t exist and it used dash.run_server thinking i was on dash 2.18

6 Likes

I see it from a slightly different perspective, this problem with the prompt. :sweat_smile:

I got more or less what I asked for, the one scatterplot, different colors for different traces but Claude got creative in the horizontal line (which is a good idea but the context was wrong and I never asked for it) and the result is sloppy in dark mode, which I think has nothing to do with your style guide. And maybe the trace names should be different but that would be my problem, not it’s, I should have told it.

You gave it a “go ahead” prompt, make something nice out of it, result looking good btw :flexed_biceps:

The way I visualize things is more like: this is the most relevant information and this is how I want to show it. And when I compare the result of this type of prompting between free or cheaper models and this Claude experiment, I think cheap performs equally well or better for me at the moment.

2 Likes

Yeah, all this is still so new so it takes time to find what works for you and the best way to prompt specific situations to get the desired outcomes you are looking for.

My approach with using the .md file most of the time is too initially tell claude to focus primarily on only the ui ux design and not functionality so that it doesn’t get confused with the callbacks and functions that can distort the response. Primarily setting up a layout templet first before moving into more of the functionality on future promps and requests.

Also would be aware from what I’ve seen 3rd party apps that built on top of the ai api don’t seem to perform as well as going to the direct source. From what I’ve tested if i attempted to use copilot with claude or t3 chat as an example it doesn’t have the performance or accuracy of directly prompting with the official claude website or application.

Have some recent stuff I’ve been working on with claude I’m hopeful I’ll be able to share soon as it’s honestly allowed me to achieve some breakthroughs within building packages that are.. crazy to say the least.

2 Likes

Hey everyone,
pretty cool graphs and creations with Claude.
I just set up the Figure Friday session for tomorrow.

1 Like

This dashboard focuses on sequels. One-off movies, where category is ‘Unique’, are excluded. The dashboard grid and all the components are dash Mantine.

The top section has two graphs to show the performance of each film, grouped by franchise. The graph on the left plots raw data, graph on the right has normalized values for better comparisons. The user selects the parameter to plot from the drop-down menu, triggering a call back. In this screen shot, world-wide gross is the plot parameter.


The next section has a pull-down to select a specific film. This triggers a call back to show a set of cards with parameters specific to the chosen film.

The last section has dash ag grid tables. The franchise table shows this list of films for the franchise of the film selected in the middle section.


Here is the code:

import polars as pl
import polars.selectors as cs
import plotly.express as px
import dash
from dash import Dash, dcc, html, Input, Output
import dash_ag_grid as dag
import dash_mantine_components as dmc
dash._dash_renderer._set_react_version('18.2.0')

print(f'{dag.__version__ = }')
#----- GLOBALS -----------------------------------------------------------------
org_col_names  = [c for c in pl.scan_csv('Marvel-Movies.csv').collect_schema()]
short_col_names =[
    'FILM', 'FRANCHISE', 'WW_GROSS', 'BUD_PCT_REC', 'CRIT_PCT_SCORE', 'AUD_PCT_SCORE', 
    'CRIT_AUD_PCT', 'BUDGET', 'DOM_GROSS', 'INT_GROSS', 'WEEK1', 'WEEK2',
    'WEEK2_DROP_OFF', 'GROSS_PCT_OPEN', 'BUD_PCT_OPEN', 'YEAR', 'SOURCE'
]
dict_cols = dict(zip(org_col_names, short_col_names))
dict_cols_reversed = dict(zip(short_col_names, org_col_names))

#----- READ & CLEAN DATASET ----------------------------------------------------
df_global = (
    pl.read_csv('Marvel-Movies.csv')
    .rename(dict_cols)
    .drop('SOURCE')
    .filter(pl.col('FRANCHISE') != 'Unique')
    .sort('FRANCHISE', 'YEAR')
    .with_columns(
        cs.string().exclude(['FILM', 'FRANCHISE'])
            .str.replace_all(r'%', '')
            .cast(pl.Float64())
            .mul(0.01)  # divide by 100 for proper percentage format
            .round(3),
        SERIES_NUM = pl.cum_count('FILM').over('FRANCHISE').cast(pl.String)
    )
    .select(
        'FRANCHISE', 'FILM', 'YEAR', 'SERIES_NUM', 'AUD_PCT_SCORE', 'BUDGET', 
        'BUD_PCT_OPEN', 'BUD_PCT_REC', 'CRIT_AUD_PCT', 'CRIT_PCT_SCORE', 
        'DOM_GROSS', 'GROSS_PCT_OPEN', 'INT_GROSS', 'WEEK1', 'WEEK2', 
        'WEEK2_DROP_OFF', 'WW_GROSS'
    )
)

df_franchise = (
    df_global
    .select('FRANCHISE', 'SERIES_NUM', 'YEAR', 'FILM')
)

film_list = sorted(df_global.unique('FILM')['FILM'].to_list())
franchise_list = sorted(df_global.unique('FRANCHISE')['FRANCHISE'].to_list())
plot_cols = sorted(df_global.select(cs.numeric().exclude('YEAR')).columns)

style_horiz_line = {'border': 'none', 'height': '4px', 
    'background': 'linear-gradient(to right, #007bff, #ff7b00)', 
    'margin': '10px,', 'fontsize': 32}

style_h2 = {'text-align': 'center', 'font-size': '32px', 
            'fontFamily': 'Arial','font-weight': 'bold'}

#----- GENERAL FUNCTIONS  ------------------------------------------------------
def get_film_data(film, item):
    result = (
        df_global
        .filter(pl.col('FILM') == film)
        [item]
        [0]
    )
    return result

def get_franchise(film):    
    return(
        df_global
        .filter(pl.col('FILM') ==  film)
        .head(1)
        ['FRANCHISE']
        [0]
    )

#----- DASHBOARD COMPONENTS ----------------------------------------------------
grid = (
    dag.AgGrid(
        rowData=df_global.to_dicts(), 
        columnDefs=[
            {
                'field': i,
                'filter': True,
                'sortable': True,
                'tooltipField': i,
                'headerTooltip': dict_cols_reversed.get(i),
            } 
            for i in df_global.columns
        ],
        dashGridOptions={
            'pagination': False,
            'rowSelection': "multiple", 
            'suppressRowClickSelection': True, 
            'animateRows' : False
        },
        columnSize='autoSize',
        id='dash_ag_table'
    ),
)

franchise_dag_table = (
    dag.AgGrid(
        rowData=df_franchise.to_dicts(), 
        columnDefs=[
            {
                'field': i,
                'filter': True,
                'sortable': True,
                'tooltipField': i,
                'headerTooltip': dict_cols_reversed.get(i),
            } 
            for i in df_franchise.columns
        ],
        dashGridOptions={
            'pagination': False,
            'rowSelection': "multiple", 
            'suppressRowClickSelection': True, 
            'animateRows' : False
        },
        columnSize='autoSize',
        id='franchise_dag_table'
    ),
)
dmc_select_param = (
    dmc.Select(
        label='Select a Parameter',
        placeholder="Select one",
        id='dmc_select_parameter',
        value='WW_GROSS',
        data=[{'value' :i, 'label':i} for i in plot_cols],
        maxDropdownHeight=600,
        w=300,
        mb=10, 
        size='xl'
    ),
)
dmc_select_film = (
    dmc.Select(
        label='Select a Film',
        id='dmc_select_film',
        value='Ant-Man',
        data=[{'value' :i, 'label':i} for i in film_list],
        maxDropdownHeight=600,
        mb=30, 
        size='xl'
    ),
)

card_names = ['FRANCHISE', 'YEAR', 'SERIES_NUM', 'AUD_PCT_SCORE', 'BUDGET', 
    'BUD_PCT_OPEN', 'BUD_PCT_REC', 'CRIT_AUD_PCT', 'CRIT_PCT_SCORE', 
    'DOM_GROSS', 'GROSS_PCT_OPEN', 'INT_GROSS', 'WEEK1', 'WEEK2', 
    'WEEK2_DROP_OFF', 'WW_GROSS'
]
card_list = []
for card in card_names:
    card_list.append(
        dmc.Card(
        children = [
            dmc.Group(
                [
                    dmc.Text(card.title().replace('_', ' '), fw=500),
                ],
                justify="space-between",
                mt="md",
                mb="xs",
            ),
            dmc.Text(
                card,
                size='lg',
                id=card.lower()
            ),
        ],
        withBorder=True,
        shadow='lg',
        radius='lg',
        )
    )

#----- CALLBACK FUNCTIONS ------------------------------------------------------
def get_plot(plot_parameter, mode):
    if mode == 'DATA':
        df_plot = (
            df_global
            .select('FILM', 'FRANCHISE', 'YEAR', 'SERIES_NUM', plot_parameter)
            .pivot(
                on='FRANCHISE',
                values=plot_parameter,
                index='SERIES_NUM'
            )
        )
    elif mode == 'NORMALIZED':
        df_plot = (
            df_global
            .select('FILM', 'FRANCHISE', 'YEAR', 'SERIES_NUM', plot_parameter)
            .pivot(
                on='FRANCHISE',
                values=plot_parameter,
                index='SERIES_NUM'
            )
            .with_columns(
                ((pl.col(franchise_list) - pl.col(franchise_list).first()) /
                pl.col(franchise_list).first()).mul(100)
            )
        )
    else:
        print(f'{mode = } is not supported !!!!')
    fig=px.line(
        df_plot,
        'SERIES_NUM',
        franchise_list,
        markers=True,
        line_shape='spline',
        title=f'{plot_parameter}_{mode}'
    )

    fig.update_layout(
        template='simple_white',
        yaxis=dict(title=dict(text=f'{plot_parameter} {mode}')),
        legend_title='FRANCHISE'
    )
    return fig

#----- DASH APPLICATION STRUCTURE-----------------------------------------------
app = Dash()
app.layout =  dmc.MantineProvider([
    html.Hr(style=style_horiz_line),
    dmc.Text('Marvelous Sequels', ta='center', style=style_h2),
    html.Hr(style=style_horiz_line),
    #html.Div(),
    dmc.Space(h=30),
    dmc.Grid(
        children = [
            dmc.GridCol(dmc_select_param, span=4, offset = 1),
        ]
    ),
    dmc.Grid(
        children = [
            dmc.GridCol(dcc.Graph(id='graph_plot'), span=5, offset = 1),
            dmc.GridCol(dcc.Graph(id='graph_norm'), span=5, offset = 0),
        ]
    ),
    dmc.Space(h=30),
    html.Hr(style=style_horiz_line),
    dmc.Text('Mantine Cards', ta='center', style=style_h2, id='mantine_cards'),
    html.Hr(style=style_horiz_line),
    dmc.Space(h=30),
    dmc.Grid(
        children = [
            dmc.GridCol(dmc_select_film, span=3, offset = 1),
        ]
    ),
    dmc.Grid(
        children = [
            dmc.GridCol(card_list[0], span=1, offset = 1),
            dmc.GridCol(card_list[1], span=1, offset = 0),
            dmc.GridCol(card_list[2], span=1, offset = 0),
            dmc.GridCol(card_list[3], span=1, offset = 0),
            dmc.GridCol(card_list[4], span=1, offset = 0),
            dmc.GridCol(card_list[5], span=1, offset = 0),
            dmc.GridCol(card_list[6], span=1, offset = 0),
            dmc.GridCol(card_list[7], span=1, offset = 0),
        ]
    ),
    dmc.Grid(
        children = [
            dmc.GridCol(card_list[8], span=1, offset = 1),
            dmc.GridCol(card_list[9], span=1, offset = 0),
            dmc.GridCol(card_list[10], span=1, offset = 0),
            dmc.GridCol(card_list[11], span=1, offset = 0),
            dmc.GridCol(card_list[12], span=1, offset = 0),
            dmc.GridCol(card_list[13], span=1, offset = 0),
            dmc.GridCol(card_list[14], span=1, offset = 0),
            dmc.GridCol(card_list[15], span=1, offset = 0),
        ]
    ),
    dmc.Space(h=30),
    html.Hr(style=style_horiz_line),
    dmc.Text('Dash AG Tables', ta='center', style=style_h2, id='data_and_defs'),
    html.Hr(style=style_horiz_line),
    dmc.Space(h=30),
    dmc.Grid([
        dmc.GridCol(
            span=5, offset=1,
            children =[
                dmc.Text(
                    'All Data Table',
                    size='xl', fw=700,
                )
            ]
        ),
        dmc.GridCol(
            span=5, offset=1,
            children =[
                dmc.Text(
                    'Franchise Table',
                    size='xl', fw=700
                )
            ]
        )
    ]),
    dmc.Grid(children = [
        dmc.GridCol(grid, span=5, offset=1),
        dmc.GridCol(franchise_dag_table, span=4, offset=1)
    ]),
    dmc.Space(h=50),
])

@app.callback(
    Output('graph_plot', 'figure'),
    Output('graph_norm', 'figure'),
    Output('franchise','children'),
    Output('year','children'),
    Output('series_num','children'),
    Output('aud_pct_score','children'),
    Output('budget','children'),
    Output('bud_pct_open','children'),
    Output('bud_pct_rec','children'),
    Output('crit_aud_pct','children'),
    Output('crit_pct_score','children'),
    Output('dom_gross','children'),
    Output('gross_pct_open','children'),
    Output('int_gross','children'),
    Output('week1','children'), 
    Output('week2','children'),
    Output('week2_drop_off','children'),
    Output('ww_gross','children'),
    Output('mantine_cards','children'),
    Output('franchise_dag_table','rowData'),

    Input('dmc_select_parameter', 'value'),
    Input('dmc_select_film', 'value'),
)
def update_dashboard(parameter, film):
    franchise = get_franchise(film)
    return (
        get_plot(parameter, 'DATA'),
        get_plot(parameter, 'NORMALIZED'),
        get_film_data(film, 'FRANCHISE'),
        get_film_data(film, 'YEAR'),
        get_film_data(film, 'SERIES_NUM'),
        f'{get_film_data(film, 'AUD_PCT_SCORE'):.0%}',
        f'{get_film_data(film, 'BUDGET'):.0f} M$',
        f'{get_film_data(film, 'BUD_PCT_OPEN'):.0%}',
        f'{get_film_data(film, 'BUD_PCT_REC'):.0%}',
        f'{get_film_data(film, 'CRIT_AUD_PCT'):.0%}',
        f'{get_film_data(film, 'CRIT_PCT_SCORE'):.0%}',
        f'{get_film_data(film, 'DOM_GROSS'):.0f} M$',
        f'{get_film_data(film, 'GROSS_PCT_OPEN'):.0f}%',
        f'{get_film_data(film, 'INT_GROSS'):.0f} M$',
        f'{get_film_data(film, 'WEEK1'):.0f} M$',
        f'{get_film_data(film, 'WEEK2'):.0f} M$',
        f'{get_film_data(film, 'WEEK2_DROP_OFF'):.0%}',
        f'{get_film_data(film, 'WW_GROSS'):.0f} M$',
        f'Movie Cards for {film}',
        (
            df_franchise
            .filter(pl.col('FRANCHISE') == franchise)
            .sort('SERIES_NUM')
            .to_dicts()
        )
    )
if __name__ == '__main__':
    app.run_server(debug=True)


3 Likes

Here I´m sharing the code I used to define the clusters,

import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score # Importamos silhouette_score
from sklearn.decomposition import PCA

data =  (pd.read_csv("Marvel-Movies.csv")
         .rename(columns={'% budget recovered':'budget_recovered'})
         .assign(budget_recovered = lambda x: x['budget_recovered'].str.replace('%', '').astype('int'),
                 profit= lambda x:x['worldwide gross']- x['budget']))

data['critics % score'] = data['critics % score'].str.replace("%", " ").astype("int")
data['audience % score'] = data['audience % score'].str.replace("%", " ").astype("int")
data['audience vs critics % deviance'] = data['audience vs critics % deviance'].str.replace("%", " ").astype("int")

features = ['worldwide gross','budget_recovered', 'critics % score', 'profit', 'audience % score','audience vs critics % deviance']
data_features = data[features]

scaler = StandardScaler()
data_features_scaled = scaler.fit_transform(data_features)
data_features_scaled = pd.DataFrame(data_features_scaled, columns=features) # Opcional: convertir a DataFrame para mejor manejo

pca = PCA(n_components=2)
pca_df = pca.fit_transform(data_features_scaled) # Aplicar PCA a los datos escalados
pca_df = pd.DataFrame(pca_df, columns=['PC1', 'PC2'])

kmeans_pca = KMeans(n_clusters=4, random_state=42, n_init=10)
kmeans_pca.fit(pca_df)
labels= kmeans_pca.labels_

## Cluster graph with the Silhoutte Score Value
silhouette = silhouette_score(pca_df, labels)
print(f"Silhoutte Score:{silhouette:.3f}")
plt.scatter(pca_df['PC1'], pca_df['PC2'], c=labels, cmap='plasma')
plt.show()

pca_cluster_df = pd.DataFrame({'Cluster':labels,
                               'worldwide gross':data_features_scaled['worldwide gross'],
                               'budget_recovered':data_features_scaled['budget_recovered'],
                               'profit':data_features_scaled['profit'],
                               'critics % score':data_features_scaled['critics % score'], 
                               'audience % score':data_features_scaled['audience % score'], 
                               'audience vs critics % deviance':data_features_scaled['audience vs critics % deviance'],
                               })

## Identifying Clusters
data1 = pca_cluster_df.groupby("Cluster").mean()
fig = px.imshow(data1, text_auto='.2f', height=400, width=800, color_continuous_scale=px.colors.sequential.Blues,
                 title="Centros de los Clústeres (K=4)") # Añadido título para el gráfico
fig.show()

# model 1: assign the cluster name to each row
model_pca_clusters = pd.Series(kmeans_pca.labels_, name='model_clusters_pca')
model_names_pca = model_pca_clusters.map({0: "Underperforming (for Marvel) & Audience-Disappointing Films",
                                          1: "Global Blockbuster Powerhouses",
                                          2: "Consistent & Solid Performers",
                                          3:"Critically Challenged & Financially Subdued (by Marvel Standards)"
                                         })

pca_final_df = pd.concat([data_features, model_names_pca, data[['year', 'film', 'budget', 'domestic gross ($m)', 'international gross ($m)','opening weekend ($m)', 'second weekend ($m)']]], axis=1)

## Saving the dataframe into a csv document to be use in dasboard
pca_final_df.to_csv("marvel_movies_pca_cluster.csv", index=False)
2 Likes