- Paste a list of documents (keywords, page titles, URLs, etc)
- Select one of them
- Find/filter/narrow other products based on similar words
- Export to CSV
It’s based on a simple document-term matrix, which is very efficiently represented, and can scale well up to a certain limit of course. You can then filter and decide how narrow you want the similarity to be.
You can try the live content similarity app or try the code on GH (has a video overview of how it might be used).
Many improvements and enhancements can be made, and any feedback is welcomed.