R plotly sankey - links are not smoothly and color is different for same source and target

  1. I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.

raw data contains below columns:

httpSessionID,time,date,dataCenter,customer,companyID,action
  1. Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.
date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1
  1. Then I draw sankey chart by R plotly script
library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
                    target = match(a$target, node_names) - 1,
                    value = a$value)

definePosition <- function(nodeList){
  #  nodeList = node_names
  # unique name endings
  endings = unique(sub('.*_', '', nodeList))
  # define intervals
  steps = 1/length(endings)
  # x-values for each unique name ending
  # for input as node position
  nodes_x = {}
  xVal = 0
  for (e in endings) {
    nodes_x[e] = xVal
    xVal = xVal + steps
    
  }
  # x and y values in list form
  x_values <- 0
  y_values <- 0
  i =1
  for (n in nodeList) {
    last = sub('.*_', '', n)
    x_values[i] = nodes_x[last]
    y_values[i] = 0.001 * length(x_values)
    i = i + 1
  }
  
  return(list(x_values, y_values))
  
}

position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]

#Plot
plot_ly(type='sankey',
             orientation = "h",
             arrangement = "snap",
             node = list (
               label = node_names,
               x = node_x,
               y = node_y,
               color = "grey",
               pad = 15,
               thinkness = 15,
               line = list(color = "grey", width = 0.5)),
               link = list(source = links$source, target = links$target, value = links$value))
    1. I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.

raw data contains below columns:

httpSessionID,time,date,dataCenter,customer,companyID,action
  1. Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.
date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1

3.Then I draw sankey chart by R plotly script

library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
                    target = match(a$target, node_names) - 1,
                    value = a$value)

definePosition <- function(nodeList){
  #  nodeList = node_names
  # unique name endings
  endings = unique(sub('.*_', '', nodeList))
  # define intervals
  steps = 1/length(endings)
  # x-values for each unique name ending
  # for input as node position
  nodes_x = {}
  xVal = 0
  for (e in endings) {
    nodes_x[e] = xVal
    xVal = xVal + steps
    
  }
  # x and y values in list form
  x_values <- 0
  y_values <- 0
  i =1
  for (n in nodeList) {
    last = sub('.*_', '', n)
    x_values[i] = nodes_x[last]
    y_values[i] = 0.001 * length(x_values)
    i = i + 1
  }
  
  return(list(x_values, y_values))
  
}

position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]

#Plot
plot_ly(type='sankey',
             orientation = "h",
             arrangement = "snap",
             node = list (
               label = node_names,
               x = node_x,
               y = node_y,
               color = "grey",
               pad = 15,
               thinkness = 15,
               line = list(color = "grey", width = 0.5)),
               link = list(source = links$source, target = links$target, value = links$value))

After I ran above script. I found that the links for the same source and target are not smoothly, and even the color is different. I guess this is because the whole link for the source and target are returned by many many different values(sub-links).

I am trying to fix this issue by find a solution with google, and couldn’t find a correct solution. By the way, I tried with smaller dataset, there’s no issue and the link color looks normal.

can the expert please help? Is there any workaround or way to avoid this UI issue?