Show and tell: Sankey plot with dash

Hello!

Disclaimer: I’m quite new to Plotly Dash so please forgive my dodgey code!
Link to app on my github: https://github.com/CMonnin/job_search_pub
Link to app on Render (make take a min or two to load as I’m using the free Render: (https://job-search-2023.onrender.com)

I’m currently trying to change career path from bioanalytical chemist into a more tech-focused role. So I decided to track my job search progress with a dashboard (partly to distract me from what feels like a fruitless search :sweat_smile:)

What the app looks like

Data

I’ve removed the csv from the public repo. I’m using huntr.co to track job applications. This also allow you to export your data to a csv.
Data looks like this:

Title Company Created at List Source
Trading Application Support Analyst John Wiley & Sons May 19 2023, 08:22 am apply to linkedin
Computational Cancer Biologist Vevo Therapeutics May 19 2023, 07:00 am applied linkedin

So I read the csv into a pandas df and do some data cleaning (details on the my github if you’re interested). One step I do is simplify job titles so I can bin them together on a bar plot (see the jupyter notebook to as to why this was necessary).

# Creating category to simplify titles
df['Category'] = 'Other'
df.loc[df['Title'].str.contains('Data Scientist', case=False), 'Category'] = 'Data Scientist'
df.loc[df['Title'].str.contains('analyst', case=False), 'Category'] = 'Data Analyst'
df.loc[df['Title'].str.contains('Data engineer', case=False), 'Category'] = 'Data Engineer'
df.loc[df['Title'].str.contains('Python', case=False), 'Category'] = 'Python Dev'
df.loc[df['Title'].str.contains('Machine Learning', case=False), 'Category'] = 'ML Engineer'

Sankey figure

Next I want to prepare data for the Sankey plot. To do this I need to know how many jobs come from each source. So I do this for all the sources. Next I need to find how many I’ve been rejected from, how many I’ve gotten an interview for etc

linkedin_count = df['Source'].value_counts()['linkedin.com']
indeed_count = df['Source'].value_counts()['indeed.com']
other_count = df['Source'].value_counts()['other']
# remove them number I've been ghosted from 
rejected_interview_count = df['List'].value_counts()['Rejected after interview'] - df['Ghosted after interview'].sum()
rejected_count =  df['List'].value_counts()['TRASH']
# right now no active interviews going so number of interviews = number I've been rejected from
interview_count = df['List'].value_counts()['Rejected after interview']
# sum the number of interviewers that ghosted me
ghosted_count = df['Ghosted after interview'].sum()
# everything left is the number I haven't heard from 
no_response_count = len(df)  - rejected_count - rejected_interview_count - ghosted_count

Next I need to set up all the nodes and their labels:

nodes = [
    {'label': 'Linkedin'},      # 0
    {'label': 'Indeed'},        # 1
    {'label': 'Placeholder'},   # 2 
    {'label': 'Other'},         # 3
    {'label': 'Applied'},       # 4
    {'label': 'No Response'},   # 5
    {'label': 'Rejected'},      # 6
    {'label': 'Ghosted'},       # 7
    {'label': 'Interview'},     # 8
]

Finally I can make the Sankey figure. I created go.Sankey figure that has the nodes label as the labels (surprise!)
Then I create the links that connect the nodes using the counts determined previously. Here the source is the node index and the target is target node index.

sankey_fig = go.Figure(data=[go.Sankey(
    node=dict(
        label=[node['label'] for node in nodes],
    ),
    link=dict(
        source=[0, 1, 2, 3, 4, 4, 4, 8, 8],
        target=[4, 4, 4, 4, 5, 6, 8, 7, 6],
        value=[linkedin_count,
               indeed_count,
                0,
               other_count,
               no_response_count,
               rejected_count,
               interview_count,
               ghosted_count,
               rejected_interview_count
               ]
    )
)])

I then deployed it to Render.
Hope you find this useful/interesting!
PS: Very open to constructive criticism

3 Likes

Welcome to the community @CMonnin and thank you for sharing your job search app. Don’t give up!

Keep sending those resumes out and going to interviews. I’m sure it will pay off soon.

2 Likes

Wouldn’t it be better to use a Parallel Categories diagram ? Issue with sankey is that while you can see that circa 25% of your applications have been rejected, it does not tell you whether these 25% all come from Indeed, Linkedin, or Other, or if they are a mix.

Parallel Cat would show the full “path” on hover.

It would make easier to see trends

2 Likes

very cool. I didn’t know that was an option. I’ll change it when I get a chance. Thanks!

1 Like

@CMonnin Very nice application! I really like how clean the charts look. As mentioned Parallel Categories chart would be great so one can compare if the application sources act differently.

Just personal reminder, keep sending, we were facing the same career path change with my gf. She had to send hundreads of CVs to finally be successfull but she said it was deffinitely worth it to go towards it :slight_smile: Good luck!

2 Likes

@martin2097 thank you!

i’ve updated the app to use parcat instead Sankey plot at @David22 suggestion. I agree it’s a better way to show the information. I pushed it so it’s live now =]

Ok. Note that if you create your dimension via go.parcats.Dimension you will have a bit more flexibility in terms of sorting. Also, you can use the ticktext to display the values, (contat the unique value of your dimension with a value_counts)