Local AI Contract Analyzer

Customer: AI | Published: 03.12.2025

Overview I need a lightweight Python utility that runs locally on my Windows laptop. The tool should monitor a predefined folder and automatically process football-transfer contracts (PDF or .txt) as soon as they appear. Each file should be sent to the OpenAI API (GPT-4o / GPT-5.1) to extract appearance-related clauses only. After extraction, the tool must output: 1. A CSV file (ready for Power BI ingestion) 2. A pandas DataFrame object saved locally (.pkl or .feather) Finally, the same tool should allow a Q&A mode over two CSVs: • Minutes Table (my exact minutes data) • Newly extracted Clause Table It must answer natural-language questions such as: • “How many appearances has the player made?” • “How many appearances until the clause triggers?” • “What is the total contingent liability this season?” The Q&A must compute and return actual numeric results based on the data in the CSVs. ⸻ Functional Requirements 1. Local Operation • Must run fully locally on Windows. • Acceptable forms: • simple setup script (install.bat) • OR standalone .exe built with PyInstaller • OR a clean Python CLI (python main.py) 2. Folder Watcher • Continuously monitor a folder, e.g. C:/ClauseReader/contracts_in/ • When a new PDF or .txt appears: • read the text • call OpenAI API • extract all appearance-related clauses • Save outputs to: • contracts_out/clauses.csv • contracts_out/clauses.pkl or .feather 3. AI Clause Extraction For each contract: • Extract only appearance / minutes / starts / subs / threshold / trigger-based clauses. • Output a structured table with columns: • ClauseType • Trigger • Threshold • Amount • Notes • OriginalText (to verify accuracy) 4. Q&A Engine User points the program to: • Minutes CSV • Clause CSV The tool should enter a Q&A loop: Input: natural-language question Output: computed numeric result + short explanation Examples: • “How many appearances has the player made this season?” • “How many appearance points remain until the next trigger?” • “What is the total contingent liability in 24/25?” • “Which clauses are most likely to trigger?” Must support: • start/sub logic • appearance points • threshold comparisons • season filtering 5. Output Formats Every contract processed must generate: • clauses.csv • clauses.pkl (DataFrame) OR .feather Both must contain the same information. ⸻ Acceptance Criteria 1. Tool runs locally on Windows via one command or .exe. 2. Automatically monitors the folder without drag-and-drop. 3. Accurate extraction of clauses from PDF/text using GPT-4o/5.1. 4. Outputs both CSV + DataFrame for each contract. 5. Q&A returns correct calculated outputs using sample Minutes + Clause Tables provided. 6. Clean README including: • environment setup • API key placement • folder structure • example questions • example clause prompts Optional (nice to have): • A tiny Tkinter GUI for Q&A • Logging of processed contracts • More polished formatting of DataFrame output ⸻ Tech Stack Requirements • Python 3.10+ • pandas • watchdog (or similar) • PyPDF2 / pdfplumber for PDF extraction • OpenAI Python SDK • Optional: Streamlit or Tkinter