SEO Crawler with Dash

Just made the first edition of a UI for an SEO crawler.

It supports a few options like:

  • List mode: crawl a known list of URLs
  • Spider mode: specify a start URL (typically the home page), and the crawler discovers and crawls pages recursively
  • Set your own User-agent
  • Include/exclude URL parameters and/or URL regex (when following links)
  • Set a maximum number of pages to crawl, after which the crawling stops
  • Export to Excel

You can get the code for the SEO crawler UI on GH (sorry my gif is a bit too big to be uploaded here, but there is one in the repo).

You can also immediately run it in Google Colab if you want.

Expanded optional crawling options:

The output data frame contains 130 columns for plotly.com. It varies based on what custom options you chose and what response headers the server returns. Sample:

Happy to get any suggestions, bugs, ideas.

Thanks!

5 Likes