Black Lives Matter. Please consider donating to Black Girls Code today.

WordCloud in DASH

#I would like show worcloud plot(tag plot) in dash python:

def generate_wordCloud(value):
import matplotlib.pyplot as plt
your_list = []
your_list = ’ ‘.join([i for i in data[data[‘code_etab’] == value][‘comment’]])
C = [i.lower() for i in your_list.split(’ ')]
from spacy.lang.fr.stop_words import STOP_WORDS
stop_words = set(STOP_WORDS)
CT = [c.lower() for c in C if c not in stop_words]
from collections import Counter
counts = Counter(CT)
counts = counts.most_common(15)
counts=[(‘rapide’, 5),(‘livraison’, 5), (‘prix’, 5), (‘bon’, 5), (‘facile’, 4), (‘site’, 4), , (‘parfait’, 3), (‘produits’, 3),(‘commande’, 3), (‘satisfaite’, 3), (‘livraison.’, 2), (‘rapidement’, 2),(‘choix’, 2)]
wordcloud = WordCloud(background_color=‘white’,
max_words=50, max_font_size=40,
random_state=42
).generate(str(counts))
wc = wordcloud
return wc

#DASH

@app.callback(Output(‘graph-5’, ‘figure’), [Input(‘dropdown’, ‘value’)])
def update_graph_5(value):
wc = generate_wordCloud(value)
return {plt.imshow(wc), plt.show()}

#if I execute this code, I can’t display the wordcloud in the dash.

#is it possible to display a wordcloud in dash?

Hey @oumar

see this thread [Solved] Is it possible to make a wordcloud in dash?

2 Likes

thank you for your answer. I looked at the link except that the proposed solution is to use wordcloud in image format. You don’t know if there’s any other way to use wordcloud in the dash without acting like it’s an image.
Thank you in advance.

I think you can implement it in Plotly directly:

For words, you would have your bag of words. colors and weights are random numbers here, but you can get them from the analysis you are doing.

words = dir(go)[:30]
colors = [plotly.colors.DEFAULT_PLOTLY_COLORS[random.randrange(1, 10)] for i in range(30)]
weights = [random.randint(15, 35) for i in range(30)]


import plotly
import plotly.graph_objs as go
from plotly.offline import plot
import random
data = go.Scatter(x=[random.random() for i in range(30)],
                 y=[random.random() for i in range(30)],
                 mode='text',
                 text=words,
                 marker={'opacity': 0.3},
                 textfont={'size': weights,
                           'color': colors})
layout = go.Layout({'xaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False},
                    'yaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False}})
fig = go.Figure(data=[data], layout=layout)

plot(fig)

For the issue of overlapping text, since Plotly allows you to zoom in and out, it shouldn’t be a big issue. You can tweak the sizes and positions until you reach a good enough solution.

As a bonus you get hover events which you can customize to display rich information for each of the words.

3 Likes

Just thought of how the overlap issue can be resolved.

Basically by ensuring that the random values of the y axis are unique.

You can simply run random.choices(range(30), k=30)

This almost completely solves it:

4 Likes

This is really great @eliasdabbas, thank you for digging into this!

1 Like

thank you very much. @eliasdabbas

1 Like

This is not reasonable. In real programs , it’s highly possible that the frequency of words are the same.

Then they would have the same size in the word cloud.

Why would that be unreasonable?

The reason i said is not right I didn’t think it about deeply. But I still think it’s not reasonable. Wordclound is to fill a certain area by words . In your case if words are more enough the highlier they overlap,your case is more like a bubble graph. To solve this problem you need more space . That is to say ,bigger x and y axis. In troditional wordclounds ,it save space. And just think about this ,if a word’ frequency is very high and another is very small , they happen to be very near ,the small one will completely disappear. But your idea is the best by now.

This line of code ensures that the size of each word lies between 15 and 35.
So nothing is going to be less than 15 or greater that 35 in size. These can be changed of course.

1 Like

I’ve pointed out that you can’t solve the overlap problem.
Just plot a data sample with great variance and you will notice what i am talking about.

just plot and change you html page window size , you will see what I said.

Having data with a big difference is a common thing and there are many ways to deal with in plotting, one of the main ones is normalizing numbers to a certain range.

Maybe if you can share an example?

HERE is ONE EXAMPLE. actually even though you normalizing numbers , when you change the window size ,they will overlap . because scatter method is not to fill the area . the smaller the window size ,the nearer the texts are.

Besides, hoverinfo is not accurate , because the xaxis of each text are so near. I find that the hoverinfo is based on differeces of the value of xaxis , but not the text itself. When you hover on one text, the info may be another one’s, because their xaxis value is similar and near .

My english is poor , may i make you understand me?

import pandas as pd
import plotly as py
import plotly.graph_objs as go
import random

words = [‘征信’, ‘拍拍贷’, ‘查询’, ‘报告’, ‘贷款’, ‘个人’, ‘怎么’, ‘信用卡’, ‘逾期’, ‘被拒’, ‘如何’, ‘中心’, ‘信用’, ‘网贷’, ‘人人’, ‘分期’, ‘注册’, ‘好信’, ‘手机’, ‘钱包’, ‘个人信用’, ‘借呗’, ‘平安’, ‘捷信’, ‘微粒贷’, ‘借钱’, ‘记录’, ‘用钱’, ‘可以’, ‘花呗’, ‘身份证’, ‘拍拍’, ‘现金’, ‘微信’, ‘还款’, ‘问问’, ‘产品’, ‘51’, ‘信而富’, ‘什么’, ‘黑名单’, ‘360’, ‘17’, ‘黑户’, ‘怎么办’, ‘金融’, ‘帮你贷’, ‘消除’, ‘密码’, ‘账号’, ‘怎样’, ‘分期乐’, ‘拒绝’, ‘申请’]
frequency = [1083, 393, 353, 167, 123, 119, 83, 64, 57, 46, 44, 40, 37, 31, 29, 29, 28, 26, 25, 23, 23, 22, 21, 19, 18, 18, 18, 18, 18, 17, 15, 15, 15, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10]
percent = [0.362086258776329, 0.13139418254764293, 0.11802072885322636, 0.055834169174189235, 0.041123370110330994, 0.03978602474088933, 0.02774991641591441, 0.02139752591106653, 0.01905717151454363, 0.015379471748579069, 0.01471079906385824, 0.013373453694416584, 0.012370444667335341, 0.010364426613172852, 0.009695753928452023, 0.009695753928452023, 0.009361417586091608, 0.008692744901370779, 0.008358408559010365, 0.0076897358742895345, 0.0076897358742895345, 0.00735539953192912, 0.007021063189568706, 0.006352390504847877, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.0056837178201270475, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.004680708793045804, 0.004680708793045804, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.004012036108324975, 0.004012036108324975, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146]
lenth = len(words)
colors = [py.colors.DEFAULT_PLOTLY_COLORS[random.randrange(1, 10)] for i in range(lenth)]

data = go.Scatter(
x=random.choices(range(lenth), k=lenth),
y=random.choices(range(lenth), k=lenth),
mode=‘text’,
text=words,
hovertext=[’{0}
{1}{2}’.format(w, f, format(p, ‘.2%’)) for w, f, p in zip(words, frequency, percent)],
hoverinfo=‘text’,
textfont={‘size’: frequency, ‘color’: colors})
layout = go.Layout({‘xaxis’: {‘showgrid’: False, ‘showticklabels’: False, ‘zeroline’: False},
‘yaxis’: {‘showgrid’: False, ‘showticklabels’: False, ‘zeroline’: False}})
fig = go.Figure(data=[data], layout=layout)

py.offline.plot(fig)

here is my example.

import pandas as pd
import plotly as py
import plotly.graph_objs as go
import random

words = [‘征信’, ‘拍拍贷’, ‘查询’, ‘报告’, ‘贷款’, ‘个人’, ‘怎么’, ‘信用卡’, ‘逾期’, ‘被拒’, ‘如何’, ‘中心’, ‘信用’, ‘网贷’, ‘人人’, ‘分期’, ‘注册’, ‘好信’, ‘手机’, ‘钱包’, ‘个人信用’, ‘借呗’, ‘平安’, ‘捷信’, ‘微粒贷’, ‘借钱’, ‘记录’, ‘用钱’, ‘可以’, ‘花呗’, ‘身份证’, ‘拍拍’, ‘现金’, ‘微信’, ‘还款’, ‘问问’, ‘产品’, ‘51’, ‘信而富’, ‘什么’, ‘黑名单’, ‘360’, ‘17’, ‘黑户’, ‘怎么办’, ‘金融’, ‘帮你贷’, ‘消除’, ‘密码’, ‘账号’, ‘怎样’, ‘分期乐’, ‘拒绝’, ‘申请’]

frequency = [1083, 393, 353, 167, 123, 119, 83, 64, 57, 46, 44, 40, 37, 31, 29, 29, 28, 26, 25, 23, 23, 22, 21, 19, 18, 18, 18, 18, 18, 17, 15, 15, 15, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10]

percent = [0.362086258776329, 0.13139418254764293, 0.11802072885322636, 0.055834169174189235, 0.041123370110330994, 0.03978602474088933, 0.02774991641591441, 0.02139752591106653, 0.01905717151454363, 0.015379471748579069, 0.01471079906385824, 0.013373453694416584, 0.012370444667335341, 0.010364426613172852, 0.009695753928452023, 0.009695753928452023, 0.009361417586091608, 0.008692744901370779, 0.008358408559010365, 0.0076897358742895345, 0.0076897358742895345, 0.00735539953192912, 0.007021063189568706, 0.006352390504847877, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.0056837178201270475, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.004680708793045804, 0.004680708793045804, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.004012036108324975, 0.004012036108324975, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146]

lenth = len(words)
colors = [py.colors.DEFAULT_PLOTLY_COLORS[random.randrange(1, 10)] for i in range(lenth)]

data = go.Scatter(
x=random.choices(range(lenth), k=lenth),
y=random.choices(range(lenth), k=lenth),
mode=‘text’,
text=words,
hovertext=[’{0}
{1}{2}’.format(w, f, format(p, ‘.2%’)) for w, f, p in zip(words, frequency, percent)],
hoverinfo=‘text’,
textfont={‘size’: frequency, ‘color’: colors})
layout = go.Layout({‘xaxis’: {‘showgrid’: False, ‘showticklabels’: False, ‘zeroline’: False},
‘yaxis’: {‘showgrid’: False, ‘showticklabels’: False, ‘zeroline’: False}})

fig = go.Figure(data=[data], layout=layout)

py.offline.plot(fig)

Two things can help with this:

  1. Normalizing the numbers as I mentioned. In this case I normalized them between 15 and 45 as follows:

    frequency = [1083, 393, 353, 167, 123, 119, 83, 64, 57, 46, 44, 40, 37, 31, 29, 29, 28, 26, 25, 23, 23, 22, 21, 19, 18, 18, 18, 18, 18, 17, 15, 15, 15, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10]

    lower, upper = 15, 45
    frequency = [((x - min(frequency)) / (max(frequency) - min(frequency))) * (upper - lower) + lower for x in frequency]

  2. X axis problem: I suggest you don’t use random numbers for the X axis, because labels are more likely to overlap. So the suggested approach is to simply use range(len(data)) for the X axis values. This does NOT completely solve it, but it’s good enough in the majority of cases.

Resizing the window will change the positions of course. Plotly is flexible with this and it adjusts, but there is a limit. If you make the window very small they will definitely eventually overlap :slight_smile:

Full modified code:

import pandas as pd
import plotly as py
import plotly.graph_objs as go
import random

words = words = ['征信', '拍拍贷', '查询', '报告', '贷款', '个人', '怎么', '信用卡', '逾期', '被拒', '如何', '中心', '信用', '网贷', '人人', '分期', '注册', '好信', '手机', '钱包', '个人信用', '借呗', '平安', '捷信', '微粒贷', '借钱', '记录', '用钱', '可以', '花呗', '身份证', '拍拍', '现金', '微信', '还款', '问问', '产品', '51', '信而富', '什么', '黑名单', '360', '17', '黑户', '怎么办', '金融', '帮你贷', '消除', '密码', '账号', '怎样', '分期乐', '拒绝', '申请']


frequency = [1083, 393, 353, 167, 123, 119, 83, 64, 57, 46, 44, 40, 37, 31, 29, 29, 28, 26, 25, 23, 23, 22, 21, 19, 18, 18, 18, 18, 18, 17, 15, 15, 15, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10]

lower, upper = 15, 45
frequency = [((x - min(frequency)) / (max(frequency) - min(frequency))) * (upper - lower) + lower for x in frequency]


percent = [0.362086258776329, 0.13139418254764293, 0.11802072885322636, 0.055834169174189235, 0.041123370110330994, 0.03978602474088933, 0.02774991641591441, 0.02139752591106653, 0.01905717151454363, 0.015379471748579069, 0.01471079906385824, 0.013373453694416584, 0.012370444667335341, 0.010364426613172852, 0.009695753928452023, 0.009695753928452023, 0.009361417586091608, 0.008692744901370779, 0.008358408559010365, 0.0076897358742895345, 0.0076897358742895345, 0.00735539953192912, 0.007021063189568706, 0.006352390504847877, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.006018054162487462, 0.0056837178201270475, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.005015045135406218, 0.004680708793045804, 0.004680708793045804, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.0043463724506853894, 0.004012036108324975, 0.004012036108324975, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.00367769976596456, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146, 0.003343363423604146]

lenth = len(words)
colors = [py.colors.DEFAULT_PLOTLY_COLORS[random.randrange(1, 10)] for i in range(lenth)]

data = go.Scatter(
x=list(range(lenth)),
y=random.choices(range(lenth), k=lenth),
mode='text',
text=words,
hovertext=['{0}{1}{2}'.format(w, f, format(p, '.2%')) for w, f, p in zip(words, frequency, percent)],
hoverinfo='text',
textfont={'size': frequency, 'color': colors})
layout = go.Layout({'xaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False},
                    'yaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False}})

fig = go.Figure(data=[data], layout=layout)

py.offline.plot(fig)

Result:

1 Like

Since random.choices is only found in Python 3.6 and above, do you have any alternate solution for folks running Python 3.5 or below with random.choice (the ‘s’ is missing)?

random.shuffle should work:

import random
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print('before:', x)
random.shuffle(x)
print('after: ', x)
before: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
after:  [2, 1, 8, 6, 5, 9, 4, 7, 10, 3]

So, based on your code, I wrote this function that plots a plotly worldcloud given an input text.

from wordcloud import WordCloud, STOPWORDS
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot

def plotly_wordcloud(text):
    wc = WordCloud(stopwords = set(STOPWORDS),
                   max_words = 200,
                   max_font_size = 100)
    wc.generate(text)
    
    word_list=[]
    freq_list=[]
    fontsize_list=[]
    position_list=[]
    orientation_list=[]
    color_list=[]

    for (word, freq), fontsize, position, orientation, color in wc.layout_:
        word_list.append(word)
        freq_list.append(freq)
        fontsize_list.append(fontsize)
        position_list.append(position)
        orientation_list.append(orientation)
        color_list.append(color)
        
    # get the positions
    x=[]
    y=[]
    for i in position_list:
        x.append(i[0])
        y.append(i[1])
            
    # get the relative occurence frequencies
    new_freq_list = []
    for i in freq_list:
        new_freq_list.append(i*100)
    new_freq_list
    
    trace = go.Scatter(x=x, 
                       y=y, 
                       textfont = dict(size=new_freq_list,
                                       color=color_list),
                       hoverinfo='text',
                       hovertext=['{0}{1}'.format(w, f) for w, f in zip(word_list, freq_list)],
                       mode="text",  
                       text=word_list
                      )
    
    layout = go.Layout(
                       xaxis=dict(showgrid=False, 
                                  showticklabels=False,
                                  zeroline=False,
                                  automargin=True),
                       yaxis=dict(showgrid=False,
                                  showticklabels=False,
                                  zeroline=False,
                                  automargin=True)
                      )
    
    fig = go.Figure(data=[trace], layout=layout)
    
    return fig

text = "Wikipedia was launched on January 15, 2001, by Jimmy Wales and Larry Sanger.[10] Sanger coined its name,[11][12] as a portmanteau of wiki[notes 3] and 'encyclopedia'. Initially an English-language encyclopedia, versions in other languages were quickly developed. With 5,748,461 articles,[notes 4] the English Wikipedia is the largest of the more than 290 Wikipedia encyclopedias. Overall, Wikipedia comprises more than 40 million articles in 301 different languages[14] and by February 2014 it had reached 18 billion page views and nearly 500 million unique visitors per month.[15] In 2005, Nature published a peer review comparing 42 science articles from Encyclopædia Britannica and Wikipedia and found that Wikipedia's level of accuracy approached that of Britannica.[16] Time magazine stated that the open-door policy of allowing anyone to edit had made Wikipedia the biggest and possibly the best encyclopedia in the world and it was testament to the vision of Jimmy Wales.[17] Wikipedia has been criticized for exhibiting systemic bias, for presenting a mixture of 'truths, half truths, and some falsehoods',[18] and for being subject to manipulation and spin in controversial topics.[19] In 2017, Facebook announced that it would help readers detect fake news by suitable links to Wikipedia articles. YouTube announced a similar plan in 2018."

init_notebook_mode(connected=True)
iplot(plotly_wordcloud(text))

And this plots ok, but some parts of the figure gets cut off:
Untitled

I played around with the different layout parameters like autosize, automargin, `pad, etc like so:

layout = go.Layout(autosize=True,
                   xaxis=dict(showgrid=False, 
                              showticklabels=False,
                              zeroline=False,
                              automargin=True),
                   yaxis=dict(showgrid=False,
                              showticklabels=False,
                              zeroline=False,
                              automargin=True),
                   margin=go.layout.Margin(pad=1000),
                  )

But it doesn’t make any difference.

Also, as can be seen in the image above, there is word overlap. I tried y=random.shuffle(y) when defining the trace in go.Scatter, but that didn’t make any difference.

Any suggestions on how to fix these?