n8n Web Scraping Workflow Builder

I need a rock-solid n8n workflow that, whenever I trigger it, navigates through selected e-commerce sites and public business directories, captures every piece of business information that is publicly available, and stores it in a clean, query-ready format. The data I care about includes the business name, category or type, “about” text, founders’ names, any additional corporate details the site reveals, plus all images properly downloaded and tagged. I will be running various data-analysis models on the output, so accuracy, consistency, and tidy structuring are non-negotiable. The flow must: • Accept a list of target URLs and run on demand (no fixed schedule). • Respect robots.txt and site rate limits while still remaining efficient. • Handle pagination, lazy-loaded sections, and common anti-bot measures. • Output to a well-structured destination of your choice—PostgreSQL, Airtable, or a neatly formatted CSV/JSON in an S3 bucket all work for me—as long as the schema is documented and repeatable. • Deliver meaningful logging and alerts so I immediately know if a job fails or a selector breaks. Please build the solution entirely inside n8n, using built-in nodes or custom JavaScript/TypeScript functions where necessary, and keep it modular enough that I can add new sites later without rewriting the whole flow. I’ll consider the job complete once you provide: 1. The exportable n8n workflow file. 2. A short README explaining how to supply new URLs, start a run, and locate the results. 3. A quick screenshare or video walkthrough demonstrating one successful scrape end-to-end. If you’ve tackled similar large-scale extractions before, I’d love to see an example.

Python

Реєстрація