Building n8n web crawler for RAG
This week, I’m introducing a new project at ScrapeNinja: a recursive web crawler, packed into an n8n community node. It isn’t just another scraper - it’s an advanced, powerful open-source tool that executes in your local n8n instance and can be used to harvest huge amounts of data, for example I use it to consolidate technical documentation (many web pages) into a clean Markdown file that I can feed into a large language model (LLM) for retrieval augmented generation (RAG) and other advanced use