Hello all,
Hoping someone can help me out, as I’m out of things to try. I have seen a similar thread with Python, but haven’t found a fix with R. Basically I run my code and everything seems fine, no error codes, and the title shows in the output, but the sankey plot doesn’t appear.
I have tried:
- making sure my R and Plotly, as well as other libraries, are up to date
- double checking my variables have values instead of words
- making sure there are equal numbers of rows and that values progress (0 → 1, etc.)
- making my data into lists or vectors rather than a dataframe
- converting my data to json format
- commenting out different parts of the code to try to isolate the problem
- successfully plotting with networkD3 so I know that it should work
Apologies it starts with a large df if someone wants to run the same and see if it works for them.
Packages:
library(readODS) #read ods files
library(tidyverse) #Tidy packages
library(dplyr) #lots of functions - data manipulation
library(janitor) #helps with data cleaning
library(plotly) #interactive web graphs
Here’s my code for downloading the data (public GHG data):
#Map loop to download UK 2022 greenhouse gas emissions data from the UK Department for Energy Security and Net Zero
downloaded <- file.exists("UKGHG_2022.ods") #checks if file is downloaded in working directory
if(downloaded != T){ #if downloaded is not true
map2("https://assets.publishing.service.gov.uk/media/65c0d54663a23d000dc821ca/final-greenhouse-gas-emissions-2022-by-source-dataset.ods", #update this link when new data available
"UKGHG_2022.ods", download.file)} #else{print('data downloaded')} #name and download or print
#Read in ods file
GHG_UK22 <- read_ods(
path = "UKGHG_2022.ods",
sheet = 1, #define tab/sheet to read
col_names = TRUE, #use header row for column names
col_types = NULL, #guess data types
na = "", #treat blank cells as NA
skip = 0, #don't skip rows
formula_as_formula = FALSE, #values only
range = NULL,
row_names = FALSE, #no row names
strings_as_factors = TRUE) %>% #use factors
clean_names() %>% #clean column names to lowercase, with underscores
filter(year == "2022") #2022 only
Then I formatted data to have proper links and labels:
#Set up Sankey links dataframe - breakdown from GHGs to sectors to subsectors
links <- data.frame(source=c(paste0(GHG_UK2022$ghg_grouped), paste0(GHG_UK2022$tes_sector)),
target=c(paste0(GHG_UK2022$tes_sector), paste0(GHG_UK2022$tes_subsector)),
value=as.numeric(paste0(GHG_UK2022$emissions_mt_co2e)))
links <- links[-c(2317:3786),] #remove instances with repeat variable in source & target
#Create nodes df from names in links df
nodes <- data.frame(
name=unique(c(as.character(links$source),
as.character(links$target))))
#Add ID numbers
links$IDsource <- as.numeric(match(links$source, nodes$name)-1)
links$IDtarget <- as.numeric(match(links$target, nodes$name)-1)
And finally, here is my Plotly code calling a sankey:
#Plot with Plotly
sankey <- plot_ly(type = "sankey",
domain = list(x = c(0,1),y = c(0,1)),
orientation = "h",
arrangement="snap", # can also change this to 'fixed'
valueformat = ".0f",
valuesuffix = "Mt CO2 eq.",
node = list(
label = nodes$name,
pad = 15,
thickness = 20,
line = list(color = "black", width = 0.5),
link = list(
source = links$IDsource,
target = links$IDtarget,
value = links$value
))) %>%
layout(
title = "Greenhouse Gas Emissions Per Sector in the UK",
font = list(size = 10),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F))
sankey
Thanks in advance for taking a look!