Invoice OCR Automation

I receive batches of pdf or image invoices and need every piece of information they contain—dates, times, invoice numbers, totals, supplier details, line items, taxes—pulled out accurately and dropped into my existing JSON template. The field names in the template match the text on the invoices, so your job is to locate each value with OCR and map it straight into the corresponding key; if a value is missing, leave the field blank rather than guessing. Any reliable stack is acceptable. Python with Tesseract, EasyOCR, Google Vision, or a comparable engine will work as long as the final script is easy for me to install and modify. Reliability and clean code matter more than raw speed. Deliverables • A runnable script or notebook that ingests JPEG invoices and produces JSON output in my template’s structure • A concise README covering setup, dependencies, and how to add or tweak mapping rules • Sample output generated from the test images I will provide Acceptance criteria: when I run the script on the supplied sample invoices, the resulting JSON must populate every field that appears on those documents with correct values and leave non-present fields blank. I will share the template file and a representative image set once we begin. Hand over your work via a Git repo or zipped archive—whichever you prefer.

Python

Реєстрація