Daily car.gr Data Scraper

I need a reliable, fully-automated pipeline that pulls fresh information from every listing on car.gr once per day, stores it in a query-friendly repository, and highlights what changed since the previous run. Core requirements • Daily crawl of the entire marketplace, respecting robots.txt and avoiding rate-limit blocks. • Change tracking that flags brand-new ads as well as listings that disappear (sold or removed). • On-the-fly statistics—total ads, average price per brand, mileage ranges, etc.—saved alongside the raw data for quick dashboards. • Clean, well-commented code that can scale beyond the Greek market if I later add more portals. Data points to capture - Price and Mileage - Brand and Model - Year and Condition - Any additional fields your scraper can reliably expose (seller type, fuel, transmission, photos, location, description text, listing URL, timestamp). Deliverables 1. Source code (Python, Node.js or another language you propose) plus requirements file or Docker image. 2. Schema-controlled storage layer (PostgreSQL, MongoDB, or flat files—convince me of the best fit). 3. One-click or cron-ready execution script and README. 4. Sample CSV/JSON export demonstrating daily deltas and summary stats. Acceptance criteria • A 24-hour test run proves 100 % listing coverage with no duplicate rows. • Second run correctly labels adds/removals and updates analytic tables. • Installation from a clean server takes under 15 minutes using only the supplied documentation. If you’ve built Scrapy spiders, headless-browser collectors, or data pipelines on AWS/Lambda/GCP before, I’m keen to see your approach and a quick timeline to MVP.

Регистрация