ContractLens — AI review for Australian property contracts

Problem

Australian VIC property contracts routinely run a hundred or two hundred pages (often including scanned documents) — practically impossible for a buyer to read in full, and a lawyer's review is expensive yet still misses details. And the legal domain is brutally demanding of AI: every conclusion has to be verifiable, and it must never cross the line into giving legal advice.

Approach

A 10-stage pipeline: PDF upload (PyMuPDF + Tesseract OCR) → a rule engine segments out Particulars / Special Conditions / Section 32 / Title & Plan / OC certificates / Council·Water / Lease and other sections → 7 specialist AI analysts review in parallel (orchestrated with LangGraph, with tiered Claude Opus / Sonnet / Haiku calls to control cost) → a one-page report. The anti-hallucination trio: mandatory verbatim citations + rapidfuzz fuzzy-match validation + targeted retries scoped only to failed citations; the output then passes two compliance gates (regex + AI semantic review) to rule out any "AI lawyer"-style overreach.

Why I built it

Australian VIC property contracts routinely run a hundred or two hundred pages, often as scanned documents. Buyers can't read them all; a lawyer's review is expensive and still misses things. This project was built with a practicing lawyer, with one goal: turn "a contract" into "a one-page report where every conclusion is verifiable."

Key design

A 10-stage pipeline. PDF upload (PyMuPDF + Tesseract OCR for scanned documents) → a rule engine segments the sections (Particulars / Special Conditions / Section 32 / Title & Plan / OC certificates / Council·Water / Lease) → 7 specialist AI analysts review in parallel (orchestrated with LangGraph) → a one-page report. Tiered model calls (Opus / Sonnet / Haiku) push the cost per contract down to about $1.

Anti-hallucination is the product's lifeline. The trio: mandatory verbatim citations, rapidfuzz fuzzy-match validation that the citation really exists, and targeted retries scoped only to failed citations; the output then passes two compliance gates (regex + AI semantic review) to make sure it never emits out-of-bounds "legal advice."

Calibrated against a real lawyer's report. 4 real contracts (104–348 pages, including one mixed-title bundle of 5 addresses) run end-to-end; reviewed line by line against a practicing lawyer's report: ~30 findings overlap and hit the lawyer's report, plus 2 ACN (company registration number) inconsistencies that, after a line-by-line check with the lawyer, were confirmed and had not been itemized in their report — not "AI beats the lawyer," but a second, tireless pair of eyes doing cross-checks to catch what slips through.

In one line

The legal domain pushes "AI output must be verifiable" to the limit — every conclusion has to trace back to the source text, which makes it the toughest training ground there is for anti-hallucination engineering.