GitHub Data Scraper

Бюджет: 250 $

I need a robust, web-based application that gathers publicly available GitHub user data—full name, email (when visible), personal site, and profile URL—without violating GitHub’s Terms of Service. The engine must be written in Python and driven by Selenium so it can handle dynamic content and future UI changes on GitHub pages. Core functionality • Scrape and store the data above with solid error handling and rate-limiting strategies. • Present the results through a clean interface that an Admin can log into. • Let the Admin refine queries via keyword and location filters, then watch real-time progress and basic stats on a dashboard. • Provide built-in export to CSV/JSON so the Admin can download filtered datasets instantly. UI expectations The front end should be intuitive enough for non-technical stakeholders yet detailed enough for power users: a statistics dashboard, granular search filters, and one-click data export all surfaced logically in a single-page layout. Technical notes • Python + Selenium are mandatory. Feel free to suggest supportive libraries (e.g., pandas, Flask/FastAPI, or a lightweight JS framework for the front end) as long as they keep performance high and the stack maintainable. • The solution has to be scalable—ideally container-ready so we can deploy to cloud infrastructure later—and should include safeguards that obey GitHub’s request limits. • Code quality, modular design, and clear documentation are non-negotiable so future devs can extend it easily. Deliverables 1. Clean, well-commented source code in a Git repo 2. Deployment guide and environment file(s) 3. Admin-only web interface with the three features listed above fully functional 4. Brief video or live demo showing the tool scraping and exporting data successfully Acceptance criteria • Scraper collects at least the four specified fields for any public GitHub profile without manual intervention. • Keyword and location filters narrow results accurately in both scrape and display. • Dashboard updates in real time and matches back-end counts. • CSV/JSON exports mirror on-screen filters exactly. • No API rate-limit violations occur in a 24-hour test run. If this aligns with your skills, let’s talk timelines and milestones so we can push to production quickly.

Python

Реєстрація