I wrote for my thesis an algorithm to archive and restore graphs. Now I am supposed to graphically compare the insertion/restore times (best case as showing a bar for insert and one for restore next to each other) - and I fail.
There are two pandas data frames:
ipdb> df_restore
Records Iteration Restore Time
0 2500 98 0.472099
1 2500 75 15.144622
2 2500 50 29.602678
3 2500 25 49.424968
4 2500 10 53.319847
5 2500 1 58.877913
6 50000 20 0.491345
7 12500 98 10.030826
8 12500 75 451.966560
9 12500 50 957.616956
10 12500 25 1522.143016
11 12500 10 1842.296471
12 12500 1 1983.945377
13 25000 20 1.435429
14 25000 15 1290.944022
15 25000 10 3022.330504
16 25000 5 4855.911046
17 25000 1 6366.254932
18 15000 98 44.334177
19 15000 75 1825.893675
20 15000 50 3809.929591
21 15000 25 5778.544083
22 15000 10 7281.707021
23 15000 1 7893.886748
24 50 98 0.034169
25 50 75 0.268342
26 50 50 0.542820
27 50 25 0.810775
28 50 10 0.964087
29 50 1 1.070637
30 5000 98 2.205191
31 5000 75 84.356992
32 5000 50 173.872551
33 5000 25 256.407212
34 5000 10 310.153673
35 5000 1 342.082321
36 750 98 0.133416
37 750 75 2.464694
38 750 50 5.049150
39 750 25 7.371530
40 750 10 8.824680
41 750 1 9.580092
42 1000 98 0.157585
43 1000 75 3.600938
44 1000 50 7.057642
45 1000 25 10.906905
46 1000 10 13.288799
47 1000 1 13.571030
48 100000 20 0.368078
and
ipdb> df_insert
Records Iteration Insertion Time
0 2500 2 20.194639
1 2500 3 19.023639
2 2500 4 27.527034
3 2500 5 21.892382
4 2500 6 29.116028
... ... ... ...
1127 100000 15 0.069552
1128 100000 16 0.069721
1129 100000 17 0.071990
1130 100000 18 0.076865
1131 100000 19 0.080163
[1132 rows x 3 columns]
Which might be displayed better as:
ipdb> df_insert.groupby("Records").sum("Insertion Time")
Insertion Time
Records
50 29.008606
123 30.848057
125 31.523406
250 60.609737
500 119.645833
750 164.781358
1000 234.136944
2500 561.528021
5000 1142.696352
12500 -3.389009 # ??
15000 39491.624527
25000 -94.609073 #Has fewer values
50000 -51.900527 # Errors, just skip it
100000 -57.855016 # Same
As you can see there are some errors or wrong numbers, so apparently during the test-runs something happened and i need to re-evaluate the disputed checks. Can be also because I might have used the wrong function, but this aggregate should just show how the rest of the data frame looks rather than being precise.
I am able to do the restore-time-stacks already, currently I am showing them as single bars. That was not so much of an issue:
px.bar(df,
y='Restore Time',
hover_data=df_single,
color='Restore Time',
x= 'Records',
height: 400,
width: 400)
with the following data:
Records Iteration Restore Time
12 12500 1 1983.945377
8 12500 75 451.966560
7 12500 98 10.030826
11 12500 10 1842.296471
10 12500 25 1522.143016
9 12500 50 957.616956
and gives something I am satisfied with:
I managed to plot the inserts next to each other similarly (note, the y-axis is logarithmic)
However, I would like to display this Restore-Bar next to the equivalent Insertation-Bar.
I considered using facet_col
but Iβm not really getting anything meaningful. Also I canβt find anything to add two data frames to the set.
I considered merging the two data frames to one, but I doubt that would be meaningful. Both have Records, which would be the foreign-key or the category of the bar. However the βIterationβ-value is not compatible and the new data frame would be somewhat nonsensical.
Is there a plot that would work for this?