Just made the first edition of a UI for an SEO crawler.
It supports a few options like:
- List mode: crawl a known list of URLs
- Spider mode: specify a start URL (typically the home page), and the crawler discovers and crawls pages recursively
- Set your own User-agent
- Include/exclude URL parameters and/or URL regex (when following links)
- Set a maximum number of pages to crawl, after which the crawling stops
- Export to Excel
You can get the code for the SEO crawler UI on GH (sorry my gif is a bit too big to be uploaded here, but there is one in the repo).
You can also immediately run it in Google Colab if you want.
Expanded optional crawling options:
The output data frame contains 130 columns for plotly.com. It varies based on what custom options you chose and what response headers the server returns. Sample:
Happy to get any suggestions, bugs, ideas.
Thanks!