Build a Web Scraper for a Login-Based Drug Repository Website (Large Dataset)

We are looking for an experienced developer to build a robust web scraping solution capable of extracting structured data from a login-protected medical/drug repository website. The platform contains a large database of drug information (potentially hundreds of thousands to over a million pages). The scraper should be able to navigate through the website after login, systematically extract relevant drug data, and store it in a structured format. Scope of Work: Develop a scraper that can log into a protected website. Navigate through the drug repository pages. Extract structured information from each drug page. Handle pagination and large-scale crawling. Implement mechanisms to prevent crashes or interruptions during long scraping runs. Store extracted data in a structured format such as JSON, CSV, or a database. Data to Extract (Example Fields): Drug name Active ingredients Indications Dosage information Contraindications Side effects Drug interactions Pharmacology details Any other structured medical information available on the page Technical Requirements: Experience with large-scale web scraping. Ability to handle login/session-based websites. Familiarity with tools such as Selenium, Playwright, Puppeteer, Scrapy, or similar frameworks. Knowledge of handling dynamic JavaScript-rendered pages. Experience with data parsing and structured data storage. Ability to implement error handling and logging. Deliverables: Fully functional scraping script or application. Clean, well-structured dataset. Documentation explaining how to run and maintain the scraper. Optional: automated scheduling or update mechanism. Preferred Skills: Python (Scrapy, Selenium, BeautifulSoup) or Node.js (Playwright, Puppeteer). Experience scraping large datasets. Experience with MongoDB or similar databases is a plus. Project Size: Medium to large. Please Include in Your Proposal: Your experience with similar scraping projects. Technologies you would use. Estimated timeline. Examples of previous work. We are looking for someone reliable who can build a scalable solution capable of handling large volumes of data efficiently.

Python

Регистрация