I apologise upfront since I am new to Python, Beautiful Soup and Dash.
I would like to scrape text from the Web, by using Beautiful Soup, and place it directly into Dash. Leading to something like the dash-multipage-report from the Dash Gallery. Hence, tons of text and some supporting graphs.
So far I have seen examples where people adding their texts by hand in HTML or Markdown format. I would like to create something where I don’t know the length of the text. Therefore, I dislike the idea of using a table to store the text in one cell. I fear, a large amount of text may cause the table to break or to shown only part of the information.
I have no clue where to start nor which keywords to use to find relevant information. I do hope you can help me.
Colourful regards, Susanne
Welcome to the community!
Do you have the consent of these sites to do what you are talking about doing?
Thank you for welcoming me to the community.
For this project/idea, I have consent from the owners of the websites. The consent is based on the robots.txt and personal approval. I will check the robots.txt before every web scraping activity. It is my intent to create bibliographic items in Dash to refer to ‘original’ text. The information created will remain free, hence without profit motive.
So, for this, I wouldn’t use Dash directly.
Since BS4 scrapes HTML, I’d save the scrape into Flask templates. Then register each path inside the flask server and render the template.
You can still use Dash as the control tower and load the flask templates inside an iframe based upon the user selection.
Thank you @jinnyzor for the answer. For me, it sounds difficult and forces me to learn yet another module/package. I am a slow learner, sorry.
The flask templates aren’t hard, especially since you are scraping the websites into the template.
Also note, that even when scraped into a template, the formatting and layout doesn’t always line up.