Create stacked bar of multipe data frame

qohelet · October 11, 2022, 3:55pm

I wrote for my thesis an algorithm to archive and restore graphs. Now I am supposed to graphically compare the insertion/restore times (best case as showing a bar for insert and one for restore next to each other) - and I fail.

There are two pandas data frames:

ipdb> df_restore
    Records  Iteration  Restore Time
0      2500         98      0.472099
1      2500         75     15.144622
2      2500         50     29.602678
3      2500         25     49.424968
4      2500         10     53.319847
5      2500          1     58.877913
6     50000         20      0.491345
7     12500         98     10.030826
8     12500         75    451.966560
9     12500         50    957.616956
10    12500         25   1522.143016
11    12500         10   1842.296471
12    12500          1   1983.945377
13    25000         20      1.435429
14    25000         15   1290.944022
15    25000         10   3022.330504
16    25000          5   4855.911046
17    25000          1   6366.254932
18    15000         98     44.334177
19    15000         75   1825.893675
20    15000         50   3809.929591
21    15000         25   5778.544083
22    15000         10   7281.707021
23    15000          1   7893.886748
24       50         98      0.034169
25       50         75      0.268342
26       50         50      0.542820
27       50         25      0.810775
28       50         10      0.964087
29       50          1      1.070637
30     5000         98      2.205191
31     5000         75     84.356992
32     5000         50    173.872551
33     5000         25    256.407212
34     5000         10    310.153673
35     5000          1    342.082321
36      750         98      0.133416
37      750         75      2.464694
38      750         50      5.049150
39      750         25      7.371530
40      750         10      8.824680
41      750          1      9.580092
42     1000         98      0.157585
43     1000         75      3.600938
44     1000         50      7.057642
45     1000         25     10.906905
46     1000         10     13.288799
47     1000          1     13.571030
48   100000         20      0.368078

and

ipdb> df_insert
      Records  Iteration  Insertion Time
0        2500          2       20.194639
1        2500          3       19.023639
2        2500          4       27.527034
3        2500          5       21.892382
4        2500          6       29.116028
...       ...        ...             ...
1127   100000         15        0.069552
1128   100000         16        0.069721
1129   100000         17        0.071990
1130   100000         18        0.076865
1131   100000         19        0.080163

[1132 rows x 3 columns]

Which might be displayed better as:

ipdb> df_insert.groupby("Records").sum("Insertion Time")
         Insertion Time
Records                
50         29.008606
123        30.848057
125        31.523406
250        60.609737
500        119.645833
750        164.781358
1000       234.136944
2500       561.528021
5000       1142.696352
12500     -3.389009 # ??
15000      39491.624527
25000     -94.609073  #Has fewer values
50000     -51.900527  # Errors, just skip it
100000    -57.855016 # Same

As you can see there are some errors or wrong numbers, so apparently during the test-runs something happened and i need to re-evaluate the disputed checks. Can be also because I might have used the wrong function, but this aggregate should just show how the rest of the data frame looks rather than being precise.

I am able to do the restore-time-stacks already, currently I am showing them as single bars. That was not so much of an issue:

 px.bar(df,
                      y='Restore Time',
                      hover_data=df_single,
                      color='Restore Time',
                      x= 'Records', 
                      height: 400, 
                      width: 400)

with the following data:

    Records  Iteration  Restore Time
12    12500          1   1983.945377
8     12500         75    451.966560
7     12500         98     10.030826
11    12500         10   1842.296471
10    12500         25   1522.143016
9     12500         50    957.616956

and gives something I am satisfied with:
Selection_406

I managed to plot the inserts next to each other similarly (note, the y-axis is logarithmic)

However, I would like to display this Restore-Bar next to the equivalent Insertation-Bar.
I considered using facet_col but I’m not really getting anything meaningful. Also I can’t find anything to add two data frames to the set.

I considered merging the two data frames to one, but I doubt that would be meaningful. Both have Records, which would be the foreign-key or the category of the bar. However the “Iteration”-value is not compatible and the new data frame would be somewhat nonsensical.

Is there a plot that would work for this?

Topic		Replies	Views
Working with a multi-index dataframe to overlay bar charts across time by sites 📊 Plotly Python	2	5642	March 12, 2020
Need help in making Diverging Stacked Bar Charts 📊 Plotly Python	3	7103	March 6, 2021
Plotting a broken bar chart from .csv Dash Python	4	1621	November 17, 2020
Call back for an overlay bar chart Dash Python question	5	490	July 22, 2022
Bars missing on animation for stacked bar plots 📊 Plotly Python question	1	517	May 11, 2023

Create stacked bar of multipe data frame

Related topics