I have the following dataset (part of the whole).
and I need to create something like this :
I can get source node & intermediate node # 1 (commodity), but I fail to get the other nodes in continuation.
I have seen way tooo many examples online and rhye have small datasets where you can map the source, target & value to get it done, however for large dataset like this one, itโs not efficinet to do that way.
My troubles are :
-
How to get intermediate node # 2 columns as nodes , as I see for source, target & intermediate node # 1, row values correspond to node values in the desired plot.
-
How to get the percentage values?
This will probably the complex sankey plot iโll draw, so please help me in whatever capacity you can. I have struggled for 2 days with countless tabs open already. 
Do you want every column of the intermediate node # 2 green box
to represent a separate node (7 separate intermediary nodes)?
Yes. Infact there are more columns on the right.
The plot in my question is representative of the same dataset. If you see the 2nd intermediate node, those are the exact same columns in the dataset highlighted in green.
ok, I can try to see if itโs possible. Can you provide this dataset or a sample dataset similar to this one?
Is there a way I can send you the dataset as .csv file. It is 583x22 (rowsxcol).
save it as a google sheet or on Once Drive (if you have Microsoft suite) and share that link with us if you can.
hi @rajpmehta
Thanks for the data. I tried several times but I wasnโt able to create the Sankey plot youโre trying to achieve, so I turned to ChatGPT.
It might be an overkill, but this is what it gave me. Hopefully, you can build off this code:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
df = pd.read_csv("cattle.csv")
# Extract the unique commodities from the data
commodities = df['Commodity'].unique()
animal_species = df['Animal species'].unique()
source_indices = {animal: idx for idx, animal in enumerate(animal_species)}
# Create a mapping for the emission types to indices for the target nodes
emission_types = ['CO2', 'CH4', 'N2O']
# Create a mapping for the commodity types to indices for the intermediate nodes
commodity_indices = {commodity: idx + len(animal_species) for idx, commodity in enumerate(commodities)}
# Update the target indices for emission types to account for the new intermediate nodes
target_indices = {emission: idx + len(animal_species) + len(commodities) for idx, emission in enumerate(emission_types)}
# Initialize the lists for sources, targets, and values for the Sankey links
sources = []
targets = []
values = []
# Intermediate lists to store links from commodities to emissions
intermediate_sources = []
intermediate_targets = []
intermediate_values = []
# Aggregate emissions by species and commodity
for species in animal_species:
for commodity in commodities:
species_commodity_data = df[(df['Animal species'] == species) &
(df['Commodity'] == commodity)]
for emission in emission_types:
# Summing up all the emissions for the species, commodity, and emission type
total_emissions = species_commodity_data[f'Total {emission} emissions (kg CO2e)'].sum()
if total_emissions > 0:
# Link from species to commodity
sources.append(source_indices[species])
targets.append(commodity_indices[commodity])
values.append(total_emissions)
# Link from commodity to emission
intermediate_sources.append(commodity_indices[commodity])
intermediate_targets.append(target_indices[emission])
intermediate_values.append(total_emissions)
# Combine sources and targets with intermediate ones
final_sources = sources + intermediate_sources
final_targets = targets + intermediate_targets
final_values = values + intermediate_values
# Create nodes list combining animal species, commodities, and emission types
nodes_list = list(animal_species) + list(commodities) + emission_types
fig = go.Figure(data=[go.Sankey(
node=dict(
pad=15,
thickness=20,
line=dict(color="black", width=0.5),
label=nodes_list
),
link=dict(
source=final_sources, # indices correspond to labels
target=final_targets,
value=final_values
))])
fig.update_layout(title_text="Sankey Diagram with Intermediate Commodity Nodes", font_size=10)
fig.show()
Thank you Adam, Iโll try to play with it and see how far I go.
If I do manage to crack it, I will post it here :))
1 Like