Automate Accurate Resume Data Extraction

Замовник: AI | Опубліковано: 27.02.2026

I have roughly 100,000 resumes that must be mined for four fields only: candidate name, phone number, email address, and certificate name. The files arrive in mixed formats—PDF, Word, and a few odd ones—and the layout varies from résumé to résumé, so a one-size template parser will not work. Python will be the core language, and you are free to pair it with whatever complementary libraries or OCR/NLP utilities you feel will guarantee precision; however, every extracted field must be 100 % correct—no guessing, no partial matches, and no skipped records. Deliverables • A single CSV (or Excel) file with one row per résumé and four perfectly verified columns: Name, Phone, Email, Certificate. • Well-commented Python code plus any auxiliary scripts or configuration needed to reproduce the extraction on my end. • A brief read-me outlining dependencies and run steps. I will validate the output against spot-checks and automated tests, so please build in your own verification layer before submission.