Hello!
Disclaimer: I’m quite new to Plotly Dash so please forgive my dodgey code!
Link to app on my github: https://github.com/CMonnin/job_search_pub
Link to app on Render (make take a min or two to load as I’m using the free Render: (https://job-search-2023.onrender.com)
I’m currently trying to change career path from bioanalytical chemist into a more tech-focused role. So I decided to track my job search progress with a dashboard (partly to distract me from what feels like a fruitless search )
What the app looks like
Data
I’ve removed the csv from the public repo. I’m using huntr.co to track job applications. This also allow you to export your data to a csv.
Data looks like this:
Title | Company | Created at | List | Source |
---|---|---|---|---|
Trading Application Support Analyst | John Wiley & Sons | May 19 2023, 08:22 am | apply to | |
Computational Cancer Biologist | Vevo Therapeutics | May 19 2023, 07:00 am | applied |
So I read the csv into a pandas df and do some data cleaning (details on the my github if you’re interested). One step I do is simplify job titles so I can bin them together on a bar plot (see the jupyter notebook to as to why this was necessary).
# Creating category to simplify titles
df['Category'] = 'Other'
df.loc[df['Title'].str.contains('Data Scientist', case=False), 'Category'] = 'Data Scientist'
df.loc[df['Title'].str.contains('analyst', case=False), 'Category'] = 'Data Analyst'
df.loc[df['Title'].str.contains('Data engineer', case=False), 'Category'] = 'Data Engineer'
df.loc[df['Title'].str.contains('Python', case=False), 'Category'] = 'Python Dev'
df.loc[df['Title'].str.contains('Machine Learning', case=False), 'Category'] = 'ML Engineer'
Sankey figure
Next I want to prepare data for the Sankey plot. To do this I need to know how many jobs come from each source. So I do this for all the sources. Next I need to find how many I’ve been rejected from, how many I’ve gotten an interview for etc
linkedin_count = df['Source'].value_counts()['linkedin.com']
indeed_count = df['Source'].value_counts()['indeed.com']
other_count = df['Source'].value_counts()['other']
# remove them number I've been ghosted from
rejected_interview_count = df['List'].value_counts()['Rejected after interview'] - df['Ghosted after interview'].sum()
rejected_count = df['List'].value_counts()['TRASH']
# right now no active interviews going so number of interviews = number I've been rejected from
interview_count = df['List'].value_counts()['Rejected after interview']
# sum the number of interviewers that ghosted me
ghosted_count = df['Ghosted after interview'].sum()
# everything left is the number I haven't heard from
no_response_count = len(df) - rejected_count - rejected_interview_count - ghosted_count
Next I need to set up all the nodes and their labels:
nodes = [
{'label': 'Linkedin'}, # 0
{'label': 'Indeed'}, # 1
{'label': 'Placeholder'}, # 2
{'label': 'Other'}, # 3
{'label': 'Applied'}, # 4
{'label': 'No Response'}, # 5
{'label': 'Rejected'}, # 6
{'label': 'Ghosted'}, # 7
{'label': 'Interview'}, # 8
]
Finally I can make the Sankey figure. I created go.Sankey
figure that has the nodes
label as the labels (surprise!)
Then I create the links that connect the nodes using the counts determined previously. Here the source is the node index and the target is target node index.
sankey_fig = go.Figure(data=[go.Sankey(
node=dict(
label=[node['label'] for node in nodes],
),
link=dict(
source=[0, 1, 2, 3, 4, 4, 4, 8, 8],
target=[4, 4, 4, 4, 5, 6, 8, 7, 6],
value=[linkedin_count,
indeed_count,
0,
other_count,
no_response_count,
rejected_count,
interview_count,
ghosted_count,
rejected_interview_count
]
)
)])
I then deployed it to Render.
Hope you find this useful/interesting!
PS: Very open to constructive criticism