Python Scraper & API Enrichment

I need a Python-based solution that automatically gathers companies and shareholders data, pulls supplementary details via external APIs, and outputs a clean, unified dataset I can query at any time. Scope of the scrape • Sources: company websites, financial databases and relevant public records. • Website focus: company profiles, turnover figures and any available Demat / share-holding particulars. What the tool should do 1. Crawl or call the above sources, respecting robots.txt and rate limits. 2. Parse the required fields, normalise names and IDs, then enrich each record through one or more APIs (for example OpenCorporates, Clearbit or any better suggestion you have). 3. Store results in a structured format (CSV plus an SQLite or Postgres option). 4. Offer a simple command-line trigger as well as a callable function so I can integrate it into larger workflows later. 5. Log activity and errors clearly. Tech stack Python 3.x with common libraries such as Requests, BeautifulSoup or Scrapy, Pandas and an ORM (SQLAlchemy is fine). If Selenium or Playwright is unavoidable for dynamic pages, please factor that in. Acceptance criteria • Full source code with virtual-env requirements file. • Sample run that fetches at least 30 real company records, shows enrichment working and saves the combined dataset. • README explaining setup, usage and how to swap in new API keys or data sources. Let me know your approach, estimated timeline and any previous work scraping financial/company data so I can move forward quickly.

Реєстрація