Unstacking grouped plot creates huge gap between groups. PlotlyExpress

Hey peeps!

As you can see from the first image my plot is nice and readable, but iā€™d like to unstack these to be able to easier seperate things. But when i do the gap between the groups becomes huge, iā€™ve tried searching the forums and google but iā€™m not able to find any resultā€¦

I have tried bargap and bargroupgap without any difference, both in ā€œupdate_layoutā€ and directly in the plot.

with barmod=ā€˜groupā€™

    df = pd.read_parquet("parquets/df.parquet")
    fig = px.bar(df, y="prod_group", x="percentage", barmode="group", color="prc", orientation='h', text_auto=True, height=1000)
    fig.update_layout(yaxis={'categoryorder':'category descending'})

Example data:

	prod_group	prc	percentage
0	57513	2011	56.2
1	57513	2012	100
2	57513	2021	51

Thanks in advance!

Hi @Mauz welcome to the forums.

I think the problem is here, that you have lotā€™s of different values for prc. I may be mistaken, but I think what happens internaly is, that plotly ā€œreservesā€ the space to draw a bar for each prc for each prod_group.

So basically what you see is not a gap between the bars, but missing prc bars for the corresponding prod_group

Here an example:

import pandas as pd
import plotly.express as px
import numpy as np

# set number of categories
cat_num = 5

# set number of types
type_num = 20

# create data
categories = []
for i in range(1, cat_num + 1):
    categories.extend([f'cat_{i}'] * type_num)

types = [f'type_{i}' for i in range(type_num)] * cat_num    

values = np.random.randint(1, 40, cat_num * type_num)    

# create DataFrame
df = pd.DataFrame({'categories': categories, 'values': values, 'types': types})

# create figure
fig = px.bar(
    data_frame=df,
    y='categories',
    x='values',
    color='types',
    barmode='group',
    orientation='h'
)
fig.update_layout(height=500)
fig.show()

creates:

If you increase the number of types, the visibility decrases. This gets even worse, if you decrease the figure height .

Thank you for your reply, you seem to be correct to some degree, but there still seem to be alot of whitespace between the groups.
I dropped all me lines with no value on percentage which brought down the number of ā€œPRCā€ somewhat.

I increased the height of the chart to 3000 just to be able to see more clearly on a specific group:
It includes all the lines for the said group but all the whitespace above seems a bit weird to meā€¦

Edit:
Or does the plot try to fit all ā€œPRCā€ to each group? It does not behave like that when barmode is not specified.

I think I did not explain myself clear enough.

For your y-value of 57513 you do not have values (bars) for each prc. I count a total of 21 pcr values but only 8 pcr bars fo the given y-value. What I think is, that the whitespace you see is the space reserved for the ā€œmissingā€ bars. Does that make sense?

Yeah, that was my thought (my last edit in the last response)
But in my data there are only 8 PRC values in group 57513

See data below, its a snippet of the data that are used but includes all of 57513 and 14.

prod_group	prc	percentage
57513	2011	34.1
57513	2021	22.1
57513	2038	100
57513	400Y	57.1
57513	4011	100
57513	4021	93.3
57513	4022	13.2
57513	4038	100
57514	101E	100
57514	1011	69.6
57514	1021	46.5
57514	9010	100

I am not sure if there is a possibility to ā€œtellā€ plotly that it should center the used bars around the y-value and ā€œforgetā€ about the others.

So it tries to fit all 21 PRCā€™s to each group then?.. Hmā€¦
Even tough they are not specified like that in the data.

To visualize more what im after:

This was made in excel using a pivotchart

Hi @Mauz a small follow up. I tried to omit the gaps using plotly.graph_objects. I admit, that this solution is quite laborious and far from perfect but it might work as a stating point for you.

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import plotly.express as px
import numpy as np

# set number of categories
cat_num = 5

# set number of types
type_num = 20

# create data
categories = []
for i in range(1, cat_num + 1):
    categories.extend([f'category_{i}'] * type_num)

types = [f'type_{i}' for i in range(type_num)] * cat_num    

values = np.random.randint(1, 40, cat_num * type_num)    

# create DataFrame
df = pd.DataFrame({'categories': categories, 'values': values, 'types': types})

# delete some rows from the DataFrame so that category_1 has fewer "types"
df = df.drop(range(1,15))

# groupby categories
gb = df.groupby('categories')

# create base figure
fig = make_subplots(
    rows=cat_num, 
    cols=1,
    shared_xaxes=True, 
    shared_yaxes=False,
    vertical_spacing=0.05
)

# create a lookup so that each "type" gets always the same color
color_code = {t: c for t,c in zip(df.types.unique(), px.colors.qualitative.Alphabet)}

for idx, name_group in enumerate(gb, start=1):
    name, group = name_group
    
    # map colors to existing "types"
    colors = [color_code[t] for t in group['types']]
    
    # append traces to figure
    fig.append_trace(
        go.Bar(
            y=group['types'], 
            x=group['values'],
            orientation='h',
            marker_color=colors,
            name=name,
        ),
        row=idx,
        col=1
    )

# create y-axis titles, start from 1 due to internal subplots row numbering
for idx, name_group in enumerate(gb, start=1):
    name, group = name_group
    fig.update_yaxes(title_text=name, row=idx, col=1)

# do not show traces in legend, hide ticklabels, set height
fig.update_traces(showlegend=False)
fig.update_yaxes(showticklabels=False)        
fig.update_layout(height=600)
fig.show()

which produces: