Black Lives Matter. Please consider donating to Black Girls Code today.

Convert phylogenetic tree to plotly graph

Hi ,

I have a problem converting a phylogenetic tree into a network graph in plotly - I get the folloing error:

x0, y0 = G.node[edge[0]]['pos']

KeyError: ‘pos’

I think my problem is that my network is not correctly formatted when I try to assign edges - when I loop over the edges in the graph:

for edge in G.edges():
print “edge”, edge
# print "New new new ", edge
print G[edge[0]], G.node[edge[0]]
print G[edge[1]], G.node[edge[1]]
# assert 1 == 0
x0, y0 = G.node[edge[0]][‘pos’]
x1, y1 = G.node[edge[1]][‘pos’]
edge_trace[‘x’] += [x0, x1, None]
edge_trace[‘y’] += [y0, y1, None]

I get the following output

edge (Clade(branch_length=0.0776315789474, name=‘PKG__PKG1’), Clade(branch_length=0.0745911347787, name=‘Inner24’))
{Clade(branch_length=0.0745911347787, name=‘Inner24’): {‘weight’: 0.07763157894736844}} {}
{Clade(branch_length=0.0252277806862, name=‘Inner45’): {‘weight’: 0.07459113477865376}, Clade(branch_length=0.0776315789474, name=‘PKG__PKG1’): {‘weight’: 0.07763157894736844}, Clade(branch_length=0.0790694519805, name=‘PKG__PKG2’): {‘weight’: 0.07906945198046661}} {}

The graph was constructed using

pos = graphviz_layout(G)

The problem I think is that the dictionary return is not understood by plotly - anybody has any idea how to convert between them?
Thanks

@pgreisen Here https://plot.ly/~empet/14264 is an example of generating an unrooted tree via Biopython-igraph-Plotly
and here https://plot.ly/~empet/14007/graphviz-networks-plotted-with-plotly/ how to convert a graphviz layout to plotly.

Dear empet,

The first link work perfectly while the second one I get an error with the map

TypeError: list indices must be integers, not Clade

when I print the edges of the graph they are still clades at this point:

V=G.nodes()
E=G.edges()
print E

[(Clade(branch_length=0.0776315789474, name=‘PKG__PKG1’), Clade(branch_length=0.0745911347787, name=‘Inner24’)), (Clade(branch_length=0.00330756013746, name=‘Inner34’), Clade(branch_length=0.0585685808634, name=‘Inner35’)), (Clade(branch_length=0.00330756013746, name=‘Inner34’), Clade(branch_length=0.0588549337261, name=‘SGK__SGK’)), (Clade(branch_length=0.00330756013746, name=‘Inner34’)

I have changed such that the positions are of the form(x,y) but I am not sure how to convert the clades.

Thanks for any help.

@pgreisen It’s not clear from your message how did you combine the steps from the two notebooks to process your data. Could you be more explicit, please?

Sure sorry :slight_smile:

The network is generated using the following script starting from a clusterw alignment file:

import networkx, pylab
from networkx.drawing.nx_agraph import graphviz_layout
from Bio import Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
from Bio import AlignIO

import plotly.plotly as py
from plotly.graph_objs import *

import networkx as nx

What color to give to the edges?

e_color = ‘#ccccff

What colors to give to the nodes with similar labels?

color_scheme = {‘RSK’: ‘#e60000’, ‘SGK’: ‘#ffff00’, ‘PKC’: ‘#32cd32’, ‘DMPK’: ‘#e600e6’, ‘NDR’: ‘#3366ff’,
‘GRK’: ‘#8080ff’, ‘PKA’: ‘magenta’, ‘MAST’: ‘green’, ‘YANK’: ‘pink’}

What sizes to give to the nodes with similar labels?

size_scheme = {‘RSK’: 200, ‘SGK’: 150, ‘PKC’: 350, ‘DMPK’: 400, ‘NDR’: 280, ‘GRK’: 370, ‘PKA’: 325, ‘MAST’: 40,
‘YANK’: 200}

Edit this to produce a custom label to color mapping

def label_colors(label):
color_to_set = 'blue’
for label_subname in color_scheme:
if label_subname in label:
color_to_set = color_scheme[label_subname]
return color_to_set

Edit this to produce a custom label to size mapping

def label_sizes(label):
# Default size
size_to_set = 20
for label_subname in size_scheme:
if label_subname in label:
size_to_set = size_scheme[label_subname]
return size_to_set

Draw a tree whose alignment is stored in msa.phy

def draw_tree():
# This loads the default kinase alignment that should be in the same directory as this script
aln = AlignIO.read(‘agc.aln’, ‘clustal’)
# This will construct the unrooted tree.
calculator = DistanceCalculator(‘identity’)
dm = calculator.get_distance(aln)
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)
G = Phylo.to_networkx(tree)

node_sizes = []
labels = {}
node_colors = []
for n in G:
    label = str(n)
    if 'Inner' in label:
        # These are the inner tree nodes -- leave them blank and with very small sizes.
        node_sizes.append(1)
        labels[n] = ''
        node_colors.append(e_color)
    else:
        # Size of the node depends on the labels!
        node_sizes.append(label_sizes(label))
        # Set colors depending on our color scheme and label names
        node_colors.append(label_colors(label))
        # set the label that will appear in each node
        labels[n] = label
# Draw the tree given the info we provided!
pos = graphviz_layout(G)
networkx.draw(G, pos, edge_color=e_color, node_size=node_sizes, labels=labels, with_labels=False,node_color=node_colors)
pylab.show()
return G

G = draw_tree()

Next I try to follow to the recipe that you suggested:

V=G.nodes()
E=G.edges()

print E

here is the output

[(Clade(branch_length=0.0776315789474, name=‘PKG__PKG1’), Clade(branch_length=0.0745911347787, name=‘Inner24’)), ###(Clade(branch_length=0.00330756013746, name=‘Inner34’), Clade(branch_length=0.0585685808634, name=‘Inner35’)), #(Clade(branch_length=0.00330756013746, name=‘Inner34’), Clade(branch_length=0.0588549337261, name=‘SGK__SGK’)), (Clade(branch_length=0.00330756013746, name=‘Inner34’), Clade(branch_length=0.0607326951399, name=‘SGK__SGK3’)), (Clade(branch_length=0.0265329682131, name=‘Inner9’),…]

I follow the script you linked to only modifying the def position(g) from N=len(g.nodes()) to N=g.nodes():

def position(g):
if not isinstance(g, pgv.AGraph):
raise ValueError(‘The graph g must be a pygraphviz AGraph’)
# N=len(g.nodes())
N=g.nodes()
pos=[]
for k in N:
s=g.get_node(k).attr[‘pos’]
t=s.split(",")
pos.append(map(float, t))
return pos

print H.get_node(0)

#H.nodes()

H.get_node(‘Inner7’).attr[‘pos’]

pos=position(H)
#print pos

[[317.97, 680.49], [252.39, 723.04], [242.76, 659.16], [384.17, 569.51], [168.44, 558.75], [95.542, 594.64], [99.477, 515.78], [242.76, 530.12],

which looks correctly to me.

import plotly.plotly as py
from plotly.graph_objs import *
def plotly_graph(E, pos):
# E is the list of tuples representing the graph edges
# pos is the list of node coordinates
N=len(pos)
Xn=[pos[k][0] for k in range(N)]# x-coordinates of nodes
Yn=[pos[k][1] for k in range(N)]# y-coordnates of nodes

Xe=[]
Ye=[]
for e in E:
    Xe+=[pos[e[0]][0],pos[e[1]][0], None]# x coordinates of the nodes defining the edge e
    Ye+=[pos[e[0]][1],pos[e[1]][1], None]# y - " - 
    
return Xn, Yn, Xe, Ye    

Xn, Yn, Xe, Ye=plotly_graph(E, pos)

and here the program crashes.

@pgreisen I defined an igraph graph, while you created a networkx one. The class Graph of the two libraries is not the same. That is why you cannot get the Plotly plot of your tree.

If you keep working with networkx, then after the line G=Phylo.to_networkx(tree) you should insert the following lines that define the lists of nodes and edges, as they are used in the Plotly plot.

V=range(len(G.nodes())) 
d={}# we need this dict to get the edges as tuples of ints, not names
for k, node in enumerate(G.nodes_iter()):
     d[node.name]=k
Edges=[(d[e[0].name], d[e[1].name])  for e in G.edges()]

Hi again - thanks for all your help.

I was wondering if there is an easy way to make a color density of the different nodes in the graph:

color = PANDAS_DATAFRAME[VALUE_TO_COLOR]
colorscale=‘Viridis’,
showscale=True

So I have tried to add color= PANDAS_DATAFRAME[VALUE_TO_COLOR] which gives me the right colorbar - the problem is that all the sub-nodes gets taken into account even though they should not be regarding for the coloring. Do u know how to fix this? Thanks!

Hi @pgreisen,
Could you, please, give more details?
Do you intend to color only some nodes with viridis colormap?
What is the length of your dataframe? Does it coincide with the total number of nodes?

Thanks man! So I have some experimental data connected with my tree and would like to color very node accordingly. So the total number of nodes in the unrooted tree contains branching points as well - e.g. let say Seq1 and seq2 are in a branch together then there will be three nodes but I only want to have a colormap of Seq1 and seq2 and not the node connecting them - so in node_colors I would like to have the information of colored according to the values of seq1 and seq2 e.g. I can see that it is coloring the connecting nodes as well and the color scale becomes wrong.

nodes

trace2=Scatter(x=Xn,
y=Yn,
mode=‘markers+text’,
#mode=‘markers’,
name=’’,
marker=Marker(symbol=‘dot’,
size=node_size,
color=node_colors,
line=Line(color=‘rgb(50,50,50)’, width=0.5)
),
#
text=display_labels,
textposition=‘right’,
textfont=dict(
family=‘sans serif’,
size=8,
color=’#ff7f0e’),
hoverinfo=‘text’
)

In this case you should define two traces for nodes:
one for those colored according to your experimental data, with viridis colormap, and the second trace consisting in the complementary nodes, colored with a common color.