Faculty Data Web Scraping

I need a clean, automated scrape of every faculty profile on the target university website. The final dataset must capture each professor’s full name, verified contact details (email and phone if shown), publication list or research output, stated research interests, department affiliation, and complete education history with emphasis on PhD information. Please extract these fields into a SQLite3 file and supply the script (Python preferred—BeautifulSoup, Scrapy or Selenium are fine) so I can rerun it later. Accuracy matters more than speed: profiles with missing or mismatched fields should be flagged, and duplicates must be removed. When the site presents paginated results or profile sub-pages, the crawler should follow those links automatically and respect polite request rates to avoid blocking. I’ll consider the job done once I can run the script on my end and obtain a file that matches the live site for: • every listed faculty member • the six data points noted above, consistently formatted Feel free to suggest improvements, but keep the output schema intact.

Python

Регистрация