Legal Data Web Scraping & Collection Developer

Customer: AI | Published: 08.03.2026
Бюджет: 30 $

PROJECT TITLE Web Scraping Developer for Global Legal & Regulatory Data Collection PROJECT OVERVIEW We are looking for a developer who can build an automated system to collect legal and regulatory documents from multiple global sources. The goal is to create a scalable automated pipeline that can gather legal data across multiple jurisdictions and regulatory domains. DATA COLLECTION SCOPE The system will collect information related to: - Medical law and healthcare regulation - Medical advertising regulation - Corporate formation and company governance laws - Investment regulation (stocks, cryptocurrency, real estate) - Tax law and administrative tax rulings - Beauty and cosmetic regulation - Medical and cosmetic manufacturing compliance - Import and export law - Customs and tariff regulation - International trade compliance frameworks RESPONSIBILITIES The developer will be responsible for: - Analyzing government legal databases and regulatory websites - Building web scraping systems and crawlers - Automatically downloading legal documents and PDF files - Extracting metadata and source URLs - Organizing collected data into structured datasets - Creating an automated data collection pipeline TECHNICAL SKILLS (PREFERRED) - Python - Web Scraping - Selenium - BeautifulSoup - Scrapy - API integration - Data extraction automation - Data pipeline development DELIVERABLES The final deliverables should include: 1. Web scraping scripts or crawler system 2. Automated legal data collection pipeline 3. Downloaded legal document datasets (PDF files and documents) 4. Structured dataset including metadata and source URLs 5. Organized storage structure for collected files 6. A compiled dataset or master reference document combining collected materials IMPORTANT REQUIREMENTS - Every collected document must include its original source URL - Data must come from official government websites or trusted legal databases - The system should support structured storage for large-scale legal datasets