Insights
EngineeringMay 20, 2026·5 min read

Getting 99% accuracy on email receipts

By Ekaba Bisong, SiliconBlast

Email receipts arrive in every format, vendor, and language. Qoobeau turns them into clean financial data at 99% accuracy. Here is the approach.

The problem

A receipt from a large retailer looks nothing like one from a local cafe. Some sit in the email body. Some hide inside a PDF. Dates use a dozen formats. Amounts mix tax and tip. A rule-based parser breaks on the first new vendor.

Structured extraction with Gemini

We send the email and any attachment to Google Gemini 2.5 Flash at temperature 0.1 for deterministic output. We ask for strict JSON: vendor, amount, tax, date, category, line items, and a confidence score. The schema does the heavy lifting.

The edge cases decide quality

The happy path is easy. Your accuracy lives in the edge cases.

  • Free trials show a zero amount. We keep them, not drop them.
  • Future-dated orders are errors, not expenses.
  • Refunds flip the sign from expense to income.
  • Ambiguous vendors fall back to the email sender.
  • Email language tells income from expense, you paid versus you received.

Confidence and review

Every extraction returns a confidence score. Low scores get flagged for your review, not pushed into your books silently. You stay in control of your numbers.

What we learned

  • Strict JSON schemas cut hallucination.
  • Low temperature makes financial data repeatable.
  • Edge cases, not the happy path, set your real accuracy.
  • A confidence score turns a model into a tool you trust.

Bring us a document or data extraction problem. We will build you a pipeline with the same discipline.