Proper Data Format for Sankey

imadsen · March 15, 2019, 6:51pm

I am trying to feed a dataframe into a Sankey Diagram, and the counts look right and the paths looks right, but the labels are wrong or undefined. I’m new to this data structure, so I’m confused on how the labels are getting mixed up or not being defined.

to see where I was going wrong, I tried replicating the Scottish_df here (with the limited data), https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/ and just copied the python code and ran it.

and it seems to have the same problem, some of the nodes are missing in the graph that are in the data, and it ends in “undefined” vs the Yes/No in the example.

Any pointers would be super helpful!

jmmease · March 16, 2019, 10:24am

Hi @imadsen,

Welcome to the forums!

Could you add the code and dataset for what you tried? I didn’t see a reference to the full dataset in the example that you linked to. (To add code to a forum post put it inside a fenced code black as described at https://help.github.com/en/articles/creating-and-highlighting-code-blocks).

If possible, please also add a screenshot of the result that you’re getting. (You can add an image to a forum post by dragging the image file into the text area)

-Jon

imadsen · March 18, 2019, 6:30pm

Thanks for the fast response and welcome Jon!

That actually might be part of the problem, I’m not sure what data I need to get the desired result.

code:

init_notebook_mode(connected=True)
data_trace = dict(
    type='sankey',
    domain = dict(
      x =  [0,1],
      y =  [0,1]
    ),
    orientation = "h",
    valueformat = ".0f",
    node = dict(
      pad = 10,
      thickness = 0,
      line = dict(
        color = "black",
        width = 0
      ),
      label =  scottish_df['Node, Label'].dropna(axis=0, how='any'),
      color = scottish_df['Color']
    ),
    link = dict(
      source = scottish_df['Source'].dropna(axis=0, how='any'),
      target = scottish_df['Target'].dropna(axis=0, how='any'),
      value = scottish_df['Value'].dropna(axis=0, how='any'),
      color = scottish_df['Link Color'].dropna(axis=0, how='any'),
  )
)

layout =  dict(
    title = "Scottish Referendum Voters who now want Independence",
    height = 900,
    font = dict(
      size = 10
    ),    
)

fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)

The data is (same as the link):

|Source|Target|Value|Color|Node, Label|Link Color|
|1|0|5|20|#F27420|Remain+No – 28|rgba(253, 227, 212, 0.5)|
|2|0|6|3|#4994CE|Leave+No – 16|rgba(242, 116, 32, 1)|
|3|0|7|50|#FABC13|Remain+Yes – 21|rgba(253, 227, 212, 0.5)|
|4|1|5|14|#7FC241|Leave+Yes – 14|rgba(219, 233, 246, 0.5)|
|5|1|6|50|#D3D3D3|Didn’t vote in at least one referendum – 21|rgba(73, 148, 206, 1)|

Ideal outcome:

My outcome:

I think what I’m confused about is, assuming it doesn’t look right due to missing data, how are the labels associated with the right target/source? e.g. the Target is undefined, but how do I define it.

jmmease · March 25, 2019, 10:47am

Hi @imadsen,

Could it be the case that the dataset you’re working with isn’t the full dataset from the example? You’re diagram has only 5 links, matching the five rows you printed out in the dataframe. The original example has ~15 links, so it must have been created from a larger data frame.

The link.source and link.target properties are arrays of integer indices into the node.label array of strings. What’s happening in your example, I think, is that the indices in the link.target array are all >= 5, and the list of node.label strings only has 5 elements, so that’s why the target nodes are undefined.

Hope that helps!
-Jon

imadsen · March 25, 2019, 7:59pm

Hey Jon,
I believe that was part 1 of my issue, thanks for explaining!

The other thing I was primarily stuck on was how labels are assigned to nodes, and I found that it’s by the index of the node in the array, with the link target id / source id. (this is different from say, ipysankeywidget, because that library uses source NAME and target NAME, instead of IDs, and then uses those as node labels)

Hope this dialog is helpful for anyone else new to plotly sankey diagrams!

Thanks Jon!

Topic		Replies	Views
Sankey Diagram Not showing correctly 📊 Plotly Python	0	937	September 1, 2021
Resolved - Sankey Chart generated only with the chart title 📊 Plotly Python	1	905	October 29, 2019
Plotly Sankey Diagram: color defaults to black and labels not showing 📊 Plotly Python	0	971	August 20, 2020
Cannot plot Sankey Diagram Plotly R	2	2106	April 19, 2018
Sankey Diagram Appearing As Blank 📊 Plotly Python	9	10183	December 14, 2017

Proper Data Format for Sankey

Related topics