WordCloud in DASH

Short answer:
Don’t do word clouds :slight_smile: I think they are misleading, and not very useful. They show sizes without numbers. It’s also difficult to tell when some words are the same size, but have more letters. If “war” and “tremendous” had the same size, the longer word might look “bigger”.
I suggest you do a horizontal bar chart, which shows the most words, is natural to read, and you can add numbers so it’s clear which is bigger / smaller.

Longer answer:

  • Try running random.shuffle several times until you get one where things look good.
  • Since plotly gives an interactive chart users can zoom and pan, so it’s not a major issue.
  • I suggest you also remove the biggest words, because they cover too much space, and they are already known. An article about Wikipedia, will most likely have that word as the top word. It’s more interesting to know the second / third level words. This way, you will have a more evenly distributed set of words, easier to read, and less overlaps.
  • Having zero overlaps is very very complicated to implement because you have different word lengths, different letter shapes, and sizes. This solution is not perfect, but if you remove the biggest 2-3 words you should be able to get something that is 90% acceptable in most cases, with a few minor overlaps.

Good luck!

2 Likes

Hi kristada619,

I don’t know if you are interested but I’ve used a third party option for a wordcloud. You can generate the wordcloud using amuellers’s wordcloud. You can send the image of the wordcloud to a file and use dash the publish the image.

I’ve tried the suggested wordclouds in this post aswell, they didn’t work out for me.

See below link for amueller’s wordcloud.

4 Likes

I made a dash app which does this!

4 Likes

Yeah, I know how to display static images in plotly, but I don’t want my wordcloud to be a static image. I want to show an interactive wordcloud, where I would like to add features such as, say, hovering on a word shows the percentage of sentences in the document where the word appears, and upon clicking on a word displays the sentence(s) containing that word, etc.

Really awesome seeing all of the activity in this thread! Another set of solutions to explore would be to create your own Dash component. Dash components are frequently wrappers around existing React components or D3 graphs and it looks like there are some good components out there already: https://www.npmjs.com/package/react-d3-cloud, https://www.npmjs.com/package/react-wordcloud, https://github.com/jasondavies/d3-cloud.

We have many guides for creating components, see:

1 Like

Wanted to add this quick solution to generating wordclouds in Dash with the python library wordcloud. This code generates wordcloud from a dictionary The wordcloud object is converted to an image. The image is passed to Dash without having to save the image to disk.

from wordcloud import WordCloud
import base64
from io import BytesIO

di = {'abc':10, 'def': 20, 'ghi':2, 'jkl':55}
wc = WordCloud().generate_from_frequencies(frequencies=di)
wc_img = wc.to_image()
with BytesIO() as buffer:
    wc_img.save(buffer, 'png')
    img2 = base64.b64encode(buffer.getvalue()).decode()

app.layout = html.Div(children=[
                    html.Img(src="data:image/png;base64," + img2)
                ])           
4 Likes