Black Lives Matter. Please consider donating to Campaign Zero's mission of ending police violence in America.
https://www.joincampaignzero.org

Sunburst diagram Flexible end to path?

Hi there,

I’m trying to plot a break down of purchases by Category. The categories are hierarchical, but sometimes the categories max out at level 1, other times at 2 etc…

Unfortunately i get these ‘NULL’ leaves that are kind of ugly

Tried replacing the ‘NULL’ string with NaN (code below)

import plotly.express as px
import numpy as np 

px.sunburst(df#.replace(to_replace ='Null', value =  np.NaN)
            , path = ['google_category_level_1_name'
                        ,'google_category_level_2_name'
                      ,'google_category_level_3_name'
                      ,'google_category_level_4_name'
                      ]
            , values = 'buyer'
           , title = f'{description}: User purchase category'
            ,branchvalues = 'remainder'
           ,width = 1100
           ,height = 1100)

I keep getting this error: 'Non-leaves rows are not permitted in the dataframe \n

is there a way to simply stop interpreting the path once the NULL is reached?

Hi @s3b4z welcome to the forum! Could you share a standalone code (ideally with dummy data or sharing your dataset) reproducing the problem? Did you also take a look at the doc example on sunburst with path and missing values? The parents of None entries must be a leaf, i.e. it cannot have other children than None. Also, I think that branchvalues should be total when using the path argument.

Hi there,

Thanks, that doc link was helpful. I thought my situation is a bit unique in that the hierarchy is not always the same amount of levels. However, when i tried making an example with dummy data I wound up solving the problem :slight_smile:

just to close the loop here’s the code

import pandas as pd
l1 = ["produce", "produce", "produce", "produce", "pantry", "pantry", "pantry", "pantry", "pantry", "ice"]
l2 = ["fruit", "fruit", "vegetable", "vegetable", "canned goods","bread", "canned goods", "baking goods", "baking goods", None]
l3 = ["apple", "banana", "tomato", "potato", "soup",None, "Beans", "flour", "active yeast", None]
l4 = ["Fuji", None, None, "idaho", "tomato",None, "black", "bleached white", None, None]
sales = [1, 3, 2, 4, 1, 2, 2, 1, 4, 1]
df = pd.DataFrame(
   dict(l1=l1, l2=l2, l3=l3,l4=l4,  sales=sales)
)
print(df)
fig = px.sunburst(df, path=['l1', 'l2', 'l3','l4'], values='sales')
fig.show()

image

It seems that None can be used to signal the end of the hierarchy (as long as there’s at least one non-None child). so the issue must be somewhere in my data-set.

Ok I know there’s almost no chance this’ll be helpful to anyone else but just in case someone winds up here my issue was due to a few things:

  1. ‘Null’ (string) vs None vs np.NaN : needs to be None
    df2 = df.replace({'Null': None})

  2. Duped Rows needed to be removed, my dataframe went down to level 7 categories & the values weren’t combining properly

path = ['google_category_level_1_name'
                        ,'google_category_level_2_name'
                      ,'google_category_level_3_name'
                      ,'google_category_level_4_name'
                      ]
df3 = df2.groupby(path).agg({'buyer':'sum'}).reset_index()

after that it worked fine :slight_smile:

import plotly.express as px
import numpy as np 

px.sunburst(df3
            , path = ['google_category_level_1_name'
                        ,'google_category_level_2_name'
                      ,'google_category_level_3_name'
                      ,'google_category_level_4_name'
                      ]
            , values = 'buyer'
           , title = f'{description}: User purchase category'
            ,branchvalues = 'remainder'
           ,width = 1100
           ,height = 1100)