- I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.
raw data contains below columns:
httpSessionID,time,date,dataCenter,customer,companyID,action
- Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.
date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1
- Then I draw sankey chart by R plotly script
library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
target = match(a$target, node_names) - 1,
value = a$value)
definePosition <- function(nodeList){
# nodeList = node_names
# unique name endings
endings = unique(sub('.*_', '', nodeList))
# define intervals
steps = 1/length(endings)
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for (e in endings) {
nodes_x[e] = xVal
xVal = xVal + steps
}
# x and y values in list form
x_values <- 0
y_values <- 0
i =1
for (n in nodeList) {
last = sub('.*_', '', n)
x_values[i] = nodes_x[last]
y_values[i] = 0.001 * length(x_values)
i = i + 1
}
return(list(x_values, y_values))
}
position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]
#Plot
plot_ly(type='sankey',
orientation = "h",
arrangement = "snap",
node = list (
label = node_names,
x = node_x,
y = node_y,
color = "grey",
pad = 15,
thinkness = 15,
line = list(color = "grey", width = 0.5)),
link = list(source = links$source, target = links$target, value = links$value))
-
- I have some user behavior data, and convert the raw data into source, target and value format, in order to draw a sankey chart of user behavior flow, and then displayed and filtered in powerBI.
raw data contains below columns:
httpSessionID,time,date,dataCenter,customer,companyID,action
- Below is a piece of sample data after converting. I want a sankey chart that could be filtered by dataCenter, customer and companyID in powerBI. So I aggregated action column by dataCenter, customer and companyID, which can keep the column and values of dataCenter,customer and companyID.
date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,iew_manage_opportunities_2,1
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,iew_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,iew_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_role_2,1
3.Then I draw sankey chart by R plotly script
library("plotly")
a = read.csv('testSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
target = match(a$target, node_names) - 1,
value = a$value)
definePosition <- function(nodeList){
# nodeList = node_names
# unique name endings
endings = unique(sub('.*_', '', nodeList))
# define intervals
steps = 1/length(endings)
# x-values for each unique name ending
# for input as node position
nodes_x = {}
xVal = 0
for (e in endings) {
nodes_x[e] = xVal
xVal = xVal + steps
}
# x and y values in list form
x_values <- 0
y_values <- 0
i =1
for (n in nodeList) {
last = sub('.*_', '', n)
x_values[i] = nodes_x[last]
y_values[i] = 0.001 * length(x_values)
i = i + 1
}
return(list(x_values, y_values))
}
position = definePosition(node_names)
node_x = position[[1L]]
node_y = position[[2L]]
#Plot
plot_ly(type='sankey',
orientation = "h",
arrangement = "snap",
node = list (
label = node_names,
x = node_x,
y = node_y,
color = "grey",
pad = 15,
thinkness = 15,
line = list(color = "grey", width = 0.5)),
link = list(source = links$source, target = links$target, value = links$value))
After I ran above script. I found that the links for the same source and target are not smoothly, and even the color is different. I guess this is because the whole link for the source and target are returned by many many different values(sub-links).
I am trying to fix this issue by find a solution with google, and couldn’t find a correct solution. By the way, I tried with smaller dataset, there’s no issue and the link color looks normal.
can the expert please help? Is there any workaround or way to avoid this UI issue?