R plotly sankey - links are not smoothly and color is different for same source and target

cheriemilk · February 21, 2022, 9:56am

I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.

raw data contains below columns:

httpSessionID,time,date,dataCenter,customer,companyID,action

Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.

date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1

Then I draw sankey chart by R plotly script

library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
                    target = match(a$target, node_names) - 1,
                    value = a$value)

definePosition <- function(nodeList){
  #  nodeList = node_names
  # unique name endings
  endings = unique(sub('.*_', '', nodeList))
  # define intervals
  steps = 1/length(endings)
  # x-values for each unique name ending
  # for input as node position
  nodes_x = {}
  xVal = 0
  for (e in endings) {
    nodes_x[e] = xVal
    xVal = xVal + steps
    
  }
  # x and y values in list form
  x_values <- 0
  y_values <- 0
  i =1
  for (n in nodeList) {
    last = sub('.*_', '', n)
    x_values[i] = nodes_x[last]
    y_values[i] = 0.001 * length(x_values)
    i = i + 1
  }
  
  return(list(x_values, y_values))
  
}

position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]

#Plot
plot_ly(type='sankey',
             orientation = "h",
             arrangement = "snap",
             node = list (
               label = node_names,
               x = node_x,
               y = node_y,
               color = "grey",
               pad = 15,
               thinkness = 15,
               line = list(color = "grey", width = 0.5)),
               link = list(source = links$source, target = links$target, value = links$value))

1. I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.

raw data contains below columns:

httpSessionID,time,date,dataCenter,customer,companyID,action

Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.

date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1

3.Then I draw sankey chart by R plotly script

library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
                    target = match(a$target, node_names) - 1,
                    value = a$value)

definePosition <- function(nodeList){
  #  nodeList = node_names
  # unique name endings
  endings = unique(sub('.*_', '', nodeList))
  # define intervals
  steps = 1/length(endings)
  # x-values for each unique name ending
  # for input as node position
  nodes_x = {}
  xVal = 0
  for (e in endings) {
    nodes_x[e] = xVal
    xVal = xVal + steps
    
  }
  # x and y values in list form
  x_values <- 0
  y_values <- 0
  i =1
  for (n in nodeList) {
    last = sub('.*_', '', n)
    x_values[i] = nodes_x[last]
    y_values[i] = 0.001 * length(x_values)
    i = i + 1
  }
  
  return(list(x_values, y_values))
  
}

position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]

#Plot
plot_ly(type='sankey',
             orientation = "h",
             arrangement = "snap",
             node = list (
               label = node_names,
               x = node_x,
               y = node_y,
               color = "grey",
               pad = 15,
               thinkness = 15,
               line = list(color = "grey", width = 0.5)),
               link = list(source = links$source, target = links$target, value = links$value))

After I ran above script. I found that the links for the same source and target are not smoothly, and even the color is different. I guess this is because the whole link for the source and target are returned by many many different values(sub-links).

I am trying to fix this issue by find a solution with google, and couldn’t find a correct solution. By the way, I tried with smaller dataset, there’s no issue and the link color looks normal.

can the expert please help? Is there any workaround or way to avoid this UI issue?

Topic		Replies	Views
PlotlyJS Visual: Sankey diagram in Power BI plotly.js show-and-tell , tips-and-tricks	0	2605	February 13, 2023
Query with sankey diagram using plotly and R Plotly R	0	993	October 11, 2017
Sankey Diagram Not showing correctly 📊 Plotly Python	0	907	September 1, 2021
Cannot plot Sankey Diagram Plotly R	2	2089	April 19, 2018
"Individual items can be tracked through the flow of a Sankey diagram" - really? Plotly R	1	1645	November 9, 2021

R plotly sankey - links are not smoothly and color is different for same source and target

Related topics