Ignore "Non-leaves rows" for sunburst diagram?

I have a DataFrame of hierarchical data:

       0    1      2      3     4
0  alice  bob  chuck  david  ella
0  alice  bob  chuck  david  fred
0  alice  bob  chuck    NaN   NaN

If I try to create a Sunburst plot, I get told

Non-leaves rows are not permitted in the dataframe

The same thing occurs if I replace the NaN with None

I’m aware that I could replace the NaNs with some dummy text, but that will distort the diagram I’m drawing.

Is there a way to skip these non-leave rows?

Thanks!

I commented out the check in plotly/express/_core.py and it worked.

See:

def _check_dataframe_all_leaves(df):
    df_sorted = df.sort_values(by=list(df.columns))
    null_mask = df_sorted.isnull()
    df_sorted = df_sorted.astype(str)
    null_indices = np.nonzero(null_mask.any(axis=1).values)[0]
    for null_row_index in null_indices:
        row = null_mask.iloc[null_row_index]
        i = np.nonzero(row.values)[0][0]
        if not row[i:].all():
            raise ValueError(
                "None entries cannot have not-None children",
                df_sorted.iloc[null_row_index],
            )
    df_sorted[null_mask] = ""
    row_strings = list(df_sorted.apply(lambda x: "".join(x), axis=1))
    #for i, row in enumerate(row_strings[:-1]):
        #if row_strings[i + 1] in row and (i + 1) in null_indices:
            #raise ValueError(
            #    "Non-leaves rows are not permitted in the dataframe \n",
            #    df_sorted.iloc[i + 1],
            #    "is not a leaf.",
            #)

My diagram renders perfectly. It would be great if there was an ignore_non_leaves=True option, rather than my horrible hack!

2 Likes

These non-leave rows are automatically created by Plotly, so you can safely delete them

Instead of commenting code from the library (which won’t work if you update or go to another machine), you can modify the dataframe:

df = df.dropna()
1 Like

Thanks @ddavo , you are the man!

But won’t this cause an issue as both column 3 and 4 would dropped if we use df.dropna()

dropna() by default will drop rows with Null values, not columns

>        0    1      2      3     4
0  alice  bob  smith    NaN   NaN
0  alice  bob  smith   rocky  NaN
0  alice  bob  chuck   david  ella
0  alice  bob  chuck   david  fred
0  alice  bob  chuck    NaN   NaN

This is more specific to my current scenario, by doing dropna() i will lose rows branching out from bob to smith. Smith and Chuck are two children belonging to Bob, how do i visualize a treemap/sunburst in such scenario without filling NaN with some dummy text

This is because the DataFrame expected needs to be rectangular. With each column having a value to group by. In the docs, you have an example with a dataframe grouped by day, then by time and then by sex.

image

In your case, I think it would be easier to just use names and parents instead of passing a DataFrame.

fig = px.treemap(
    names =   ["Alice", "Bob"  , "Smith", "Rocky", "Chuck", "David", "Ella" , "Fred" ],
    parents = [""     , "Alice", "Bob"  , "Smith", "Bob"  , "Chuck", "David", "David"]
)

Thanks @edent! I’ve also had to use this hack.