Sankey Diagram: Handling Circularity Error

Hello there,

I am trying to represent a large amount of data in a Sankey Diagram. The problem is that my data has cycles and, to the best of my knowledge, they are not supported by plotly. So I am trying to properly delete them: I want to find these cycles and based on their value just keep in the Sankey the difference between them.

I defined a method in my sankey class to do that as follows:

def preprocess_data(self, source, target, value):
    equal = True
    while equal:
        equal = False
        for i in range(len(source)):
            if source[i] == target[i]:
                source.remove(source[i])
                target.remove(target[i])
                value.remove(value[i])
                equal = True
                break
    cycles = True
    while cycles:
        cycles = False
        for i in range(len(source)):
            if source[i] in target:
                if source[target.index(source[i])] == target[i]:
                    if value[i] > value[target.index(source[i])]:
                        value[i] = value[i] - value[target.index(source[i])]
                        source.remove(source[target.index(source[i])])
                        value.remove(value[target.index(source[i])])
                        target.remove(target[target.index(source[i])])
                        cycles = True
                        break
                    elif value[i] < value[target.index(source[i])]:
                        value[target.index(source[i])] = value[target.index(source[i])] - value[i]
                        source.remove(source[i])
                        target.remove(target[i])
                        value.remove(value[i])
                        cycles = True
                        break
                    elif value[i] == value[target.index(source[i])]:
                        source.remove(source[target.index(source[i])])
                        value.remove(value[target.index(source[i])])
                        target.remove(target[target.index(source[i])])
                        source.remove(source[i])
                        target.remove(target[i])
                        value.remove(value[i])
                        cycles = True
                        break
    return source, target, value

It managed to delete cycles but I am still getting an ERROR: Circularity is present in the Sankey data. Removing all nodes and links. It is my data:

Source  Target  Value
1   0   3.055036682
2   1   0.255226984
3   1   0.17286605
4   1   0.283457885
5   1   2.189129762
6   1   0.146398649
7   1   0.000330826
8   5   0.014816468
9   5   1.761388323
10  5   0.007640889
11  5   0.034437287
12  5   0.36981087
13  9   0.051029227
14  9   0.048415849
15  9   0.025553877
16  9   0.014874949
17  9   0.026363222
18  9   0.027792755
19  9   0.03991485
20  9   1.473528785
21  9   0.05176817
8   20  0.013481225
10  20  0.010946264
22  20  1.175625342
23  20  0.51347902
24  20  0.043903806
11  20  0.024993631
25  22  0.75991767
26  22  0.145394536
27  22  0.005930888
28  22  0.012234177
29  22  0.002055835
30  22  0.018474162
31  22  0.156176541
32  22  0.033256631
33  22  0.003159624
34  22  0.018618654
35  22  0.010317747
36  25  0.727858859
37  25  0.039782352
27  36  0.00945708
38  36  0.000444879
39  36  0.040179593
40  36  0.677729291
41  40  0.01814106
42  40  0.153769306
43  40  0.327314909
8   40  0.000454993
44  40  0.060977151
45  40  0.066703595
46  40  0.029803695
11  40  0.002020937
47  40  0.332885758
31  23  0.083713487
15  23  0.013335976
16  23  0.046455924
48  23  0.025814488
19  23  0.011392122
21  23  0.332460405
10  21  0.006533316
49  21  0.343844244
50  21  0.029481036
51  21  0.012668768
25  12  0.012360508
26  12  0.005031381
13  12  0.010749498
30  12  0.003701076
17  12  0.005553512
34  12  0.006710585
20  12  0.310404369
39  12  0.009207691
52  49  0.000207941
14  49  0.020680254
29  49  0.000317917
32  49  0.005142853
33  49  0.000488609
40  49  0.314920228
53  49  0.000366989
35  49  0.001595551
29  47  2.38345E-05
32  47  0.000385564
33  47  3.66314E-05
6   47  3.64378E-06
54  47  4.81106E-06
55  47  0.154840432
56  47  6.63752E-06
57  47  0.001367098
53  47  2.75135E-05
58  47  8.17783E-06
59  47  5.75289E-06
29  43  7.77673E-05
60  43  5.05489E-07
32  43  0.001258018
17  43  2.37004E-05
33  43  0.000119521
61  43  0.30422448
21  43  4.66976E-07
53  43  8.97709E-05
62  43  0.014012209
35  43  0.000390296
63  61  0.300086221
64  61  0.041554598
65  61  0.032388115
8   61  9.40855E-06
66  61  0.324759598
67  61  0.147241861
11  61  6.93695E-05
68  61  0.157452076
69  61  0.017542372
57  66  0.009835633

Any clue of why am I still getting that error? Thanks!

@MiguelGC,
It’s difficult to detect the cycles visually.

A method to check whether you still have cycles is
to set up the list of edges, i.e. all tuples, (source, target), and run this code:

import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline

edges = []  #  the list of tuples (source, target)
G = nx.Graph(directed=True)
G.add_edges_from(edges)
position = nx.spring_layout(G)
nx.draw_networkx(G, position)

Thanks empet!

This is my output, very confusing:

I am pretty sure that there is no cycles as A -> A -> A or A -> B -> A

But is there other kind of cycles that could generate a circularity error?

Thanks!

Sorry, I gave you the definition of a dirrected igraph Graph. Replace, please, G=nx.Graph(directed=True) by
G=nx.DiGraph(). When it is drawn you can see the directed edges (with attached arrows, pointing out the direction). If you cannot distinguish any cycle in the graph plot, then insert the following

print(list(nx.simple_cycles(G)))
print(nx.find_cycle(G))

Thanks empet! I see now that I have to take care also for β€œ>2 level cycles” such as A->B->C->D->A