Engineer for Fraud Detection System

Hiring: Senior Backend / ML Engineer Ongoing Contract — Fraud Detection & Document Intelligence Platform Project: FraudX Type: Long-term contract (with expansion potential) Focus: Document fraud detection, OCR intelligence, ML readiness Stage: Live production system – final accuracy + expansion phase ⸻ OVERVIEW I’m building FraudX, a production-grade document fraud detection platform used to analyze: • Paystubs • Proof of income • (Next phase) Proof of residence • (Next phase) Driver licenses & IDs • (Next phase) Bank statements & financial docs The system is already live, handling real documents, with a working backend, UI, OCR pipeline, and fraud rules engine. I am looking for a senior engineer to help: • Improve extraction accuracy • Harden document parsing • Expand to new document types • Prepare the system for long-term ML training • Continue building new fraud detection modules over time This is not a short-term gig. If we work well together, this will become ongoing work. WHAT IS ALREADY BUILT (LIVE IN PRODUCTION) Infrastructure • Ubuntu VPS (DigitalOcean) • Gunicorn + Uvicorn • systemd • Nginx + SSL • PostgreSQL (asyncpg) • Fully deployed backend Core Platform • FastAPI backend • Role-based access (Admin / Dealer / Guest) • Immutable scan storage • File hashing • OCR pipeline • Admin dashboard • Scan audit trail OCR & Extraction • Google Vision OCR • Google Document AI • AWS Textract • OCR normalization layer • OCR fallback handling • GPT used as non-authoritative assist • Multi-provider OCR support Fraud System • Deterministic fraud rules engine • PASS / CAUTION / FAIL output • Explainable flags • Non-ML decisioning (ML is advisory only) • Admin labeling + review system ML Readiness • Training labels stored • Feature storage in JSONB • Shadow ML scoring • Future LightGBM / XGBoost pipeline planned 1. Paystub Extraction (Top Priority) Current state: • OCR works • Text exists • Some extraction works • But line-item parsing is incomplete Needs: • Earnings table extraction • Deductions parsing • Multi-line pay components • Hourly vs salary normalization • YTD vs current matching • Employer/employee cleanup • Reliable structured output This is the core issue affecting accuracy. ⸻ 2. OCR Consensus & Normalization We already run: • Google Vision • Google DocAI • AWS Textract • GPT assist We need: • Cross-engine reconciliation • Confidence scoring • Conflict resolution • Smart merging of results ⸻ 3. Future Document Types (Next Phase) Planned additions: • Proof of residence • Driver licenses • Bank statements • Employment letters Same pipeline, new extractors. ⸻ 4. ML Readiness (Ongoing) Already have: • Real + fake documents • Labeled fraud examples • Admin review system Need: • Feature engineering • Consistent schema • Model-friendly extraction • Training pipeline readiness ⸻ IDEAL CANDIDATE You should be comfortable with: Python (FastAPI preferred) OCR pipelines Document parsing PDF / image processing Structured data extraction PostgreSQL / JSONB Async systems Production debugging Big Plus If You Have: Fraud detection experience Fintech / underwriting systems ML feature engineering OCR accuracy optimization PDF forensics Document templating systems ⸻ WHAT YOU WILL NOT DO Rebuild frontend Rewrite the system Replace architecture Work on UI styling This is backend intelligence work, not UI. ⸻ ENGAGEMENT TYPE • Long-term contract • Ongoing feature development • Multiple document types coming • Flexible hours • Paid hourly or milestone-based Rate: Open to discussion (expecting senior-level rates) $60–$120/hr depending on experience or fixed-price per module ⸻ TO APPLY — PLEASE ANSWER: 1. Have you worked with OCR or document parsing before? 2. Have you built systems that extract structured data from PDFs? 3. Have you worked with fraud, fintech, or identity verification? 4. How would you approach paystub parsing? 5. What tools/libraries would you use? 6. Are you available for ongoing work? 7. Your hourly rate or project estimate

Реєстрація