I am trying to plot a Sankey diagram and it keep showing up as blank. I have a dataframe that it is being built off of. I tried manually keying in the source and target and that gets it to work. However, I would like for it to be a function so that I can run it at different time periods.
Below is an example of the starter Dataframe. I am using Dummy data since the information is sensitive.
Customer Type | Fee Type | Total |
---|---|---|
AX | PT | 5000 |
AX | VF | 200 |
D | PT | 4000 |
D | VF | 7000 |
EC | PT | 1500 |
EC | PT | 250 |
I have used code similar to the other questions on here. It is provided below.
def genSankey(df, category_col=[], val_col='', title='Test Sankey'):
colorPallete =['#4B8BBE','#306998','#FFE873','#FFD43B','#646464']
labelList = []
colorNumList = []
for catCol in category_col:
labelListTemp = list(set(df[catCol].values))
colorNumList.append(len(labelListTemp))
labelList = labelList + labelListTemp
labelList = list(dict.fromkeys(labelList))
colorList = []
for idx, colorNum in enumerate(colorNumList):
colorList = colorList + [colorPallete[idx]] * colorNum
for i in range(len(category_col)-1):
if i == 0:
sourceTargetDf = df[[category_col[i],category_col[i+1], val_col]]
sourceTargetDf.columns = ['source', 'target', 'sum']
else:
tempDf = df[[category_col[i],category_col[i+1], val_col]]
tempDf.columns = ['source', 'target', 'sum']
sourceTargetDf = pd.concat([sourceTargetDf, tempDf])
sourceTargetDf = sourceTargetDf.groupby(['source','target']).agg('sum').reset_index()
sourceTargetDf['sourceID'] = sourceTargetDf['source'].apply(lambda x: labelList.index(x))
sourceTargetDf['targetID'] = sourceTargetDf['target'].apply(lambda x: labelList.index(x))
data = dict(
type='sankey',
node = dict(
pad=15,
thickness = 20,
line = dict(
color = 'black',
width = 0.5
),
label = labelList,
color = colorList
),
link = dict(
source = sourceTargetDf['sourceID'],
target = sourceTargetDf['targetID'],
value = sourceTargetDf['sum']
)
)
layout = dict(
title = title,
font = dict(size=10
)
)
fig = dict(data = [data], layout=layout)
print(labelList)
print(sourceTargetDf)
return fig
I added the two print statements to check that the labels and the dataframe ID columns are numeric. I have also checked that I do not have any duplicate source/target pairs.
I tried manually making the source/target/ and values lists and plot it. That worked out for me so it seems like there is an issue somewhere in the dataframe conversion.