Figure Friday 2024 - week 52

adamschroeder · December 27, 2024, 2:42pm

join the Figure Friday session on January 3, at noon Eastern Time, to showcase your creation and receive feedback from the community.

Did you know that Bentley Systems successfully went public 36 years after it was founded?

In week 52 of Figure-Friday we’ll look at the dataset of 172 public SaaS companies with information on their market cap, stock price, IPO year, and much more.

Things to consider:

can you improve the sample figure below (bar plot)?
would you like to tell a different data story using a different graph?
can you create a Dash app instead?

Sample figure:

Code for sample figure:

import plotly.express as px
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-52/SaaS-businesses-NYSE-NASDAQ.csv')
# Correct the IPO Year (when company went public), based on online research
df.loc[df['Company'] == 'Exela Technologies', 'IPO Year'] = 2017
df.loc[df['Company'] == 'Blackboxstocks', 'IPO Year'] = 2021

# Filter the dataset to companies that were founded in 2000 or later
df_filtered = df[df['Year Founded'] >= 2000]

# Calculate the number of years to IPO for each company
df_filtered['Years to IPO'] = df_filtered['IPO Year'] - df_filtered['Year Founded']

# Group by 'Year Founded' to get the average 'Years to IPO'
average_years_to_ipo = df_filtered.groupby('Year Founded')['Years to IPO'].mean().reset_index()

# Round the column 'Years to IPO' to two decimal places
average_years_to_ipo['Years to IPO'] = average_years_to_ipo['Years to IPO'].round(2)
print(average_years_to_ipo)

fig = px.bar(average_years_to_ipo, x='Year Founded', y='Years to IPO', text_auto=True,
             title="Average time from companies' foundation to going public (in years)",
             labels={'Years to IPO': 'Years to IPO (average)'})
fig.show()

Participation Instructions:

Create - use the weekly data set to build your own Plotly visualization or Dash app. Or, enhance the sample figure provided in this post, using Plotly or Dash.
Submit - post your creation to LinkedIn or Twitter with the hashtags #FigureFriday and #plotly by midnight Thursday, your time zone. Please also submit your visualization as a new post in this thread.
Celebrate - join the Figure Friday sessions to showcase your creation and receive feedback from the community.

If you prefer to collaborate with others on Discord, join the Plotly Discord channel.

Data Source:

Thank you to PublicSaaSCompanies for the dataset.

AnnMarieW · December 27, 2024, 4:15pm

Interesting dataset! Looks like the PublicSaaSCompinies site uses Plotly figures

Mike_Purtell · December 29, 2024, 5:38pm

We don’t see many comments about the sample visualization provided by @adamschroeder to start the new week. But this week’s sample tells an interesting story of how the time to IPO has decreased in a nearly linear manner since the year 2000, from about 16 years down to about 2 years. It will be interesting to see if we can figure this out from the data set and consideration of macroeconomics

This dataset offers so many stories that can be visualized. I look forward to seeing them, and an interesting call on January 3.

marieanne · December 30, 2024, 10:30am

Trends I would consider to explain this:

techstack moved from php/mysql (2000 - 2010) to more options like react/angular PWA’s etc
2010 onwards => app development, increase of commerce
2022 onwards => AI & indie hacking

and greed, fear of missing out as an investor?

In the dataset you only see the successful companies, you do not see the number of interesting initiatives. investors could chose from.

That having penned down without any substantiation, I use the FF to play with some kind of visual a bit more, this time the scatter_map to discover how far I can go without a creditcard.
Basically a bad idea because I should get the addresses of the headquarters to get something useful. The idea behind it was to see if there is a pattern for annualized revenue/YoY Growth% if you map the companies on a map. Could be something like California (negative growth) versus the rest, positive growth, or whatever.

It looks like this:

Please do not bother to give feedback, because it’s not a good idea at all. The only thing which I do not understand, rgb(255,0,0) is red, why is it green on the map.

Have a good day! Marie-Anne

adamschroeder · December 30, 2024, 2:35pm

That’s weird to me as well, @marieanne .
rgb(255,0,0) is red but the legend shows it as green. Would you like to share the code so we can try to see where the bug is?

marieanne · December 30, 2024, 3:11pm

Changed the colorcodes to “red”,“green”, “blue” and with capitals, no difference. The popup shows the correct colorname/number, the plot switches green and red.

The code:

Summary

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 27 20:53:14 2024

@author: win11
"""

import plotly.graph_objects as go
from dash import Dash, dcc, html
import pandas as pd
import plotly.express as px


#DATA MANIPULATION#


df_us_cities = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_us_cities.csv')
#this is about rough to do and not 100% accurate.
df_us_cities = df_us_cities.drop('pop', axis=1)
#remove trailing space from column name (=city)
df_us_cities['name']=df_us_cities['name'].apply(lambda x: x.strip())
#remove duplicates
df_us_cities = df_us_cities.drop_duplicates()
#it's a bit silly if New York is not found


df_us_cities['name'] = df_us_cities['name'].str.replace('New York','New York City')
df_us_cities['name'] = df_us_cities['name'].str.replace('Boise City','Boise')






df = pd.read_csv('https://raw.githubusercontent.com/plotly/Figure-Friday/refs/heads/main/2024/week-52/SaaS-businesses-NYSE-NASDAQ.csv')

# Correct the IPO Year (when company went public), based on online research
df.loc[df['Company'] == 'Exela Technologies', 'IPO Year'] = 2017
df.loc[df['Company'] == 'Blackboxstocks', 'IPO Year'] = 2021

df['Headquarters'] = df['Headquarters'].fillna('')

# Filters the dataset to United States because there's a csv with lat/long united states for free.
df_filtered = df[df['Headquarters'].str.contains('United States')].copy()
#still love Stack Overflow, gets string part before first , in one run.
df_filtered['City'] = df['Headquarters'].apply(lambda x: x.strip().split(',')[0])


#lookup lat long in df_us_cities, pity if a city is twice there.
#this goes wrong, duplicates after merge ???????

df_out = df_filtered.merge(df_us_cities, left_on='City', right_on='name', how='left')
df_out = df_out.drop_duplicates()
#it's a bit silly if New York is not found
#strip $ from column
df_out['Annualized Revenue'] = df_out['Annualized Revenue'].str.replace('$','')
df_out['Annualized Revenue'] = df_out['Annualized Revenue'].str.replace(',','')
#and yoy %
df_out['YoY Growth%'] = df_out['YoY Growth%'].str.replace('%','')
df_out['Annualized Revenue'] = pd.to_numeric(df_out['Annualized Revenue'] , errors='coerce', downcast="float")
df_out['YoY Growth%'] = pd.to_numeric(df_out['YoY Growth%'] , errors='coerce', downcast="float")
#df_out['YoY color'] = df_out['YoY Growth%'].apply(lambda x: 'rgba(255,0,0,.7)' if x < -10 else ('rgba(0,0,255,.7)' if x > 10  else 'rgba(0, 255, 0, .7)'))
df_out['YoY color'] = df_out['YoY Growth%'].apply(lambda x: 'Red' if x < -10 else ('Blue' if x > 10  else 'Green'))
 

#THE MAP SHOWS YOY % AS COLOR, boundary is 50^%, those are outliers falling away from the map
#but the need special attention.

df_out['text'] =  '<b>' + df_out['Company']+ '</b><br>Headquarters: ' + df_out['City'] +'<br>Annualized revenue: '+  (df_out['Annualized Revenue']/1e6).astype(str)+' million<br>YoY Growth%: ' + df_out['YoY Growth%'].astype(str) 

    





fig = px.scatter_map(df_out, lat="lat", lon="lon", size="Annualized Revenue", hover_name='text',
                  #color_continuous_scale=px.colors.sequential.Bluered_r,  
                  color = 'YoY color',
                  
                  
                  zoom=2)

app = Dash()
app.layout = html.Div([
    dcc.Graph(figure=fig)
], style={'width':'1200px','height':'900px'})

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter

AnnMarieW · December 30, 2024, 3:11pm

The initial figure Adam posted shows that the years to IPO has steadily decreased, but the sample size is very small and I don’t think you can generalize it. Only 10 companies in the dataset were founded since 2015. You can find a comprehensive study on this topic here: https://site.warrington.ufl.edu/ritter/files/IPOs-Age.pdf

I think this is an interesting dataset and as @Mike_Purtell mentioned, there are lots of stories to tell.

I made a Dash app to help explore the dataset. It uses Dash AG Grid to display the data, and a single Scatter plot for the figure. I chose a Scatter plot because it’s an effective way to show how the different company metrics relate to each other. It’s simple and it’s easy to spot outliers.

You can see the Code in GitHub, or see it live on PyCafe:

Since this is a dataset of SaaS (Software as a Service) companies and I thought it would be good to add a column for the “Rule of 40%” which is a popular metric for SaaS companies. It’s calculated as YoY Revenue Growth% + EBITDA Margin%. I’ve added this to the hover data in the figure, so it shows for any combination of x and y selections.

Rule of 40 is a convenient way to determine if a SaaS company is balancing the growth rate and the cost of achieving that growth rate. Historically, companies with a Rule of 40 equal to or greater than 40% will experience high enterprise value multiples to revenue.

The app lets you explore data in multiple ways.

Sort or filter the grid, and the scatter plot updates to match.
Change the X and Y axes in the dropdown menus, and the app highlights those columns in the grid.
Click a row in the grid or a point on the scatter plot to see more details about a specific company.

This app uses Dash Mantine Components and includes all these features in ~200 lines of code.

figure_friday_week52

Ester · December 30, 2024, 4:43pm

Hi @AnnMarieW, I’m thinking something similar, but it might be completely different.

adamschroeder · December 30, 2024, 6:53pm

I see. @marieanne you would need to update the color through update_traces:

fig.update_traces(marker_color=df_out['YoY color'])

adamschroeder · December 30, 2024, 6:57pm

Nice app, @Ester .

I like how the bar graph is sorted from lowest to highest amount of years-to-IPO.

If your focus is on company, another cool way to build the app is to create a dropdown (instead of the rangeSlider) for the user to choose any of the SaaS companies. Once chosen, they could see all the companies’ Years-to-IPO chart.

marieanne · December 30, 2024, 7:03pm

Thank you, works

adamschroeder · December 30, 2024, 7:05pm

@AnnMarieW I really like the app you created. I was surprised that only 24 companies were above the Rule of 40%. I would have expected more companies to be above that.

It’s also rare to see column sorted connected to the sorting of a graph. Well done!

Arkimetrix · December 30, 2024, 7:07pm

Hi everyone, just showcasing the versatility of Plotly!

Ester · December 31, 2024, 3:40pm

Thank you! I updated!

image|690x442

Avacsiglo21 · January 2, 2025, 2:23pm

HI Everyone Best Wishes for 2025

I briefly share my approach to this dataset. This time, I followed these criteria:

Companies with a positive Net Income Margin.
Focus on key indicators (‘Market Cap’, ‘Annualized Revenue’, ‘YoY Growth%’, ‘Revenue Multiple’, ‘EBITDA Margin’, ‘Net Income Margin’) and performed cluster analysis to identify companies in each cluster.",
At the end, identified 3 clusters with these key metrics(Market Cap, YoY Growth% and Net Income Margin)

adamschroeder · January 2, 2025, 3:01pm

That’s a cool way of visualizing the shrinking wait times, @Arkimetrix . Thank you.

Do you mind posting the code so we can learn from you?

Welcome to the Plotly community

adamschroeder · January 2, 2025, 3:03pm

nice app @Avacsiglo21 . I like those cards at the top that summarize the information.

Are you able to share the code with us?

Mike_Purtell · January 2, 2025, 3:40pm

Silicon Valley (a term popularized by Don Hoeffler) is a region of California known for the Semiconductor Industry. Industrial semiconducor production started in the 1960’s in the Santa Clara Valley, by companies in San Jose, Santa Clara, Cupertino, Sunnyvale, Mountain View and Palo Alto. Stanford University, Hewlett-Packard, and Fairchild Semiconductor played huge roles in the start and growth of this region.

The Santa Clara Valley, mosly rural and famous for growing fruit and garlic, had the stable foundations required for vibration-sensitive wafer lithography and wafer test equipment. Densly populated urban areas like San Francisco were not suitable for this work. San Francisco was not considered part of Silicon Valley in the early days.

What changed was the booming software industry, which does not have the same physical constraints required for manufacturing of semiconductors. Software heated up in the late 1990s in many places unsuitable for silicon work, such as downtown San Francisco or lower Manhattan in New York City. Silicon Valley was initially defined by companies who develop silicon based semiconductors. In recent years the definition has expanded to include software and many other technologies.

This brings us to 2025, where Silicon Valley as a region has expanded to include the entire San Francisco/San Jose region. Some people even apply this term to technology companies no matter where they are located, however many places have their own acroynm containing the word silicon, such as Silicon Alley (NYC), Silcon Prarie (Austin), etc.

I enjoyed using map_libre again for a Figure Friday project. I have included the 58 out of 172 companies (34%) in this week’s data set based that are based in the San Francisco bay area, mostly to look at who these companies are, where are they located, and what do they do. With Map Libre, it is easy to see the 3 primary locations where these companies are located: In Santa Clara County (the original Silicon Valley), in downtown San Francisco, and in San Mateo County.

The code includes a function to wrap hover text across multiple lines. This is very useful for long descriptive strings. I used Chat GPT to make a table with longitude and latitude of these companies, and stored them in this file: “df_norcal_coords.csv”.

The code for this visualization is posted here on plotly community, however to run it you need the file with longitude and latitude values. You can get everything from:

• Git Repo: GitHub - Mike-Purtell/Plotly_FF_2024: Plotly FIgure Friday work in 2024
• Folder: Week_52_Stocks.

Here is a map_libre view of the Bay Area, showing the locations of Bay Area companies in the dataset. The marker sizes and colors are tied to Market Cap.

Here is a view of software companies clustered in downtown San Francisco.

Here is the code:

'''
Plotly Figure Friday - 2024 week 52 - stocks 
58 out of 172 SaaS (Software as a Service) companies in this dataset (34%) are
based in the San Francisco Bay Area, a region that includes San Jose, Silicon 
Valley, Palo Alto and Berkeley. This script uses map Libre to map the locations 
of the SF Bay Area companies, with useful hover and marker sized by Market Cap.
'''
import plotly.express as px
import polars as pl
currency_to_floats = ['Stock Price']
currency_to_mils = [
    'Market Cap', 'Last Quarter Revenue','Annualized Revenue',
    'Last Quarter EBITDA','Annualized EBITDA','Last Quarter Net Income',
    'Annualized Net Income','Cash and Short Term Investments'
    ]

def wrap_hover(text, CHARS_PER_LINE=45):
    '''
    break long hover text into multiple lines, split with html line feeds.
    1st whitespace after chars_per_line value is exceeeded is replaced with <br>
    '''
    result = []
    
    # Counter to track line_Length
    line_length = 0
    
    # Iterate over each character in the text
    for char in text:
        line_length += 1
        if char.isspace():
            if line_length > CHARS_PER_LINE:
                result.append('<br>')
                line_length = 0
            else:
                result.append(char)
        else:
            result.append(char)
    
    return ''.join(result)


df_norcal = (
    pl.scan_csv('SaaS-businesses-NYSE-NASDAQ.csv')  # scan_csv --> lazy frame
    .filter(pl.col('Headquarters').str.contains('California'))
    .with_columns(
        Headquarters = 
            pl.col('Headquarters')
            .str.split(by =',')
            .list.slice(0,1)
            .list.first()
    )
    .filter(    # exclude 4 cities on the list from southern california
        ~pl.col('Headquarters')
        .is_in(['Glendale', 'San Diego', 'Santa Barbara','Ventura'])
    )
    # remove Orcacle, and Snowflake - they have moved out of the Bay Area
    .filter(~pl.col('Company').str.contains('Oracle'))
    .filter(~pl.col('Company').str.contains('Snowflake'))

    # convert dollars as strings to millions of dollars as floats
    .with_columns(pl.col(currency_to_mils).str.replace_all(',', ''))
    .with_columns(pl.col(currency_to_mils).str.replace('$', '', literal=True))
    .with_columns(pl.col(currency_to_mils).cast(pl.Float64))
    .with_columns(pl.col(currency_to_mils)/1000000)
    .with_columns(pl.col(currency_to_mils).round(0))

    # convert dollars as strings to floats
    .with_columns(pl.col(currency_to_floats).str.replace_all(',', ''))
    .with_columns(pl.col(currency_to_floats).str.replace('$', '', literal=True))
    .with_columns(pl.col(currency_to_floats).cast(pl.Float64))
    .with_columns(pl.col(currency_to_floats).round(2))

    # create a categorical column based on Market Cap
    .with_columns(MKT_CAP_CAT = pl.lit('')) # intialize
    .with_columns(
        MKT_CAP_CAT = 
            pl.when(pl.col('Market Cap')>100e3).then(pl.lit('> $100B'))
              .when(pl.col('Market Cap')>10e3).then(pl.lit('> $10B'))
              .when(pl.col('Market Cap')>1e3).then(pl.lit('> $1B'))
              .when(pl.col('Market Cap')>100).then(pl.lit('> $100M'))
              .when(pl.col('Market Cap')>10).then(pl.lit('> $10M'))
              .when(pl.col('Market Cap')>1).then(pl.lit('> $1M'))
    )
    .with_columns(
        MARKER_SIZE = 
            pl.when(pl.col('Market Cap')>100e3).then(pl.lit(6))
              .when(pl.col('Market Cap')>10e3).then(pl.lit(5))
              .when(pl.col('Market Cap')>1e3).then(pl.lit(4))
              .when(pl.col('Market Cap')>100).then(pl.lit(3))
              .when(pl.col('Market Cap')>10).then(pl.lit(2))
              .when(pl.col('Market Cap')>1).then(pl.lit(1))
    )
    .with_columns(
        PROD_DESC_WRAP = 
            pl.col('Product Description')
            .map_elements(wrap_hover, return_dtype=pl.String)
    )
    .drop('Company Website', 
          'Company Investor Relations Page',
          'Lead Investor(s) Pre-IPO',
          'S-1 Filing',
          'September 2024 Website Traffic (Estimate)',
          'YoY Change in Website Traffic%',
          '2023 10-K Filing',
          'Product Description'
    )
    .collect()      # collect uses lazy frame to make polars dataframe
)
for c in currency_to_mils:
    df_norcal = df_norcal.rename({c: c + ' [M$]'})

#-------------------------------------------------------------------------------
#    Load file with GPS coordinates -- this data is from ChatGPT 
#-------------------------------------------------------------------------------
df_coords =(
    pl.scan_csv('df_norcal_coords.csv')
    # remove degree notation from Long, Lat strings before convert to float
    .with_columns(pl.col(['Long', 'Lat']).str.replace('°N', ''))
    .with_columns(pl.col(['Long', 'Lat']).str.replace('°W', '')) 
    .with_columns(pl.col(['Long', 'Lat']).cast(pl.Float64))
    # For Longitude, have to multiply by -1 to get degrees west
    .with_columns(pl.col(['Long'])*-1.0)
    .collect()
)
#-------------------------------------------------------------------------------
#    Join GPS coordinates with main dataset 
#-------------------------------------------------------------------------------
df = (
    df_norcal
    .join(
        df_coords,
        how = 'left',
        on='Company'
    )
)
df_cols = df.columns
left_cols = [c for c in df_cols if c not in ['PROD_DESC_WRAP', 'Founder(s)']]
right_cols = ['PROD_DESC_WRAP', 'Founder(s)']
df = df.select(pl.col(left_cols  + right_cols))

#-------------------------------------------------------------------------------
#    Make a GPS Scatter Plot of Bay Area Companies 
#-------------------------------------------------------------------------------
#define marker colors based on Market Cap
dict_marker_map = {
    '> $1M': px.colors.qualitative.Set1[0],
    '> $10M':  px.colors.qualitative.Set1[5],
    '> $100M': px.colors.qualitative.Set1[4],
    '> $1B':  px.colors.qualitative.Set1[3],
    '> $10B':px.colors.qualitative.Set1[1],
    '> $100B':px.colors.qualitative.Set1[2],
}
fig = px.scatter_map(
    df.sort('Market Cap [M$]', descending=True),
    lat='Lat',
    lon='Long',
    height=1000, width=800,
    size='MARKER_SIZE', # 'Market Cap [M$]', 
    color='MKT_CAP_CAT',
    color_discrete_map= dict_marker_map,
    zoom=9,
    map_style='carto-voyager', #open-street-map',  # 'streets',
    custom_data=[
        'Company',             #  customdata[0]
        'Year Founded',        #  customdata[1]
        'IPO Year',            #  customdata[2]
        'Headquarters',        #  customdata[3]
        'Market Cap [M$]',     #  customdata[4]
        'PROD_DESC_WRAP',      #  customdata[5]
    ],
    title='SaaS Companies in San Francisco Bay Area/Silicon Valley'
)
#------------------------------------------------------------------------------#
#     Apply hovertemplate                                                      #
#------------------------------------------------------------------------------#
fig.update_traces(
    hovertemplate =
        '<b>%{customdata[0]}' +
        ' (Founded: %{customdata[1]},  ' +
        'IPO: %{customdata[2]})</b><br>' +
        '%{customdata[3]}<br>' +
        'Market Cap : $%{customdata[4]:,}M<br>' +
        'Products : %{customdata[5]}<br>' +
        '<extra></extra>'
)

fig.update_layout(
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family='arial',  # 'sans-serif mono', # 'courier new', 
    ),
    legend_title='Market Cap'
)

fig.show()

adamschroeder · January 2, 2025, 4:07pm

Thank you, @Mike_Purtell , for sharing some history about silicon valley.
It’s amazing how helpful chatGPT can be in research. I’m glad you used it for getting the companies’ coordinates.

So, San Mateo is also considered part of Silicon Valley?

Topic		Replies	Views
Figure Friday 2024 - week 50 Dash Python announcements , figure-friday	38	363	December 24, 2024
Figure Friday 2025 - week 12 Dash Python figure-friday	21	184	April 2, 2025
Figure Friday 2024 - week 31 Dash Python figure-friday	19	370	August 10, 2024
Figure Friday 2025 - week 10 Dash Python announcements , figure-friday	29	280	March 20, 2025
Figure Friday 2024 - week 51 Dash Python announcements , figure-friday	20	239	December 27, 2024

Figure Friday 2024 - week 52

Things to consider:

Participation Instructions:

Data Source:

Related topics