JOB TITLE: AI Integration Specialist – Deep Web Research & Structured Data Extraction PLEASE SEE THE NOTES AT THE BOTTOM BEFORE APPLYING. PROJECT SUMMARY: We’re seeking an experienced AI integration specialist to help solve a complex data-enrichment challenge that requires combining multiple AI and web-search APIs to produce structured research outputs. Our current system aggregates structured information from public sources and feeds it into a reporting platform. The existing scraper and database work correctly, but the AI enrichment layer needs improvement. The goal is to make models like Perplexity, GPT, Claude or Grok retrieve deeper and more accurate information from the Internet and map it directly into defined fields within each report (e.g., company details, contacts, background information, context summaries). We require someone to design a reliable reasoning and retrieval chain that can query, interpret and populate structured data with minimal human oversight. KEY OBJECTIVES: Audit the current enrichment pipeline (Perplexity API, GPT or similar) and identify its technical limits Implement or recommend a hybrid retrieval-augmented generation (RAG) approach to enable deep research beyond single-query API constraints. Combine multiple APIs (e.g., Perplexity Search, Exa.ai, Brave, SerpAPI, or Grok) with a reasoning model to extract and verify data from live web sources. Parse and normalise extracted information into predefined schema fields (JSON or database columns) Create a retry and fallback logic where one model or search source fails to return complete results Maintain accurate mappings for company names, emails, phone numbers, contact roles, and source links. SKILLS AND EXPERIENCE REQUIRED: - Proven experience integrating LLMs (GPT-4/5, Claude, Grok, Perplexity, etc.) into production data pipelines - Strong understanding of RAG (retrieval-augmented generation) architecture and vector databases - Skilled in Python (FastAPI or similar), JSON parsing and API orchestration - Practical experience with LangChain or LlamaIndex for chaining reasoning and retrieval - Knowledge of structured output formatting (pydantic, function calling or tool use) - Familiarity with data cleaning, validation and error handling - Ability to demonstrate past work where AI systems retrieved factual web data and populated structured fields NICE TO HAVE: - Experience with maritime, logistics, compliance, or other regulated data domains - Exposure to large-scale scraping, ETL, or knowledge-graph building - Comfort working with PostgreSQL or similar databases DELIVERABLES: - Working enrichment pipeline that connects one or more AI reasoning models with external search APIs - Structured, verifiable output aligned with existing report schema - Documentation of the new data flow and configuration - Demonstration of improved retrieval accuracy and depth NOTES: - This role is ideal for developers who have built or tuned multi-model retrieval systems, automated research tools, or structured intelligence reports using LLMs --> We want to see proof you have worked on this sort of project before we go ahead. - You need to sign an NDA.