Learn

What is verified software?

Software that has been mathematically proven correct — not just tested. This guide explains what that means, how it works, and why it's becoming practical.

Testing vs. proof

Testing checks that software works for specific inputs. You write a test that says add(2, 3) == 5, run it, and confirm it passes. But you haven't checked add(2147483647, 1), or any of the other billions of possible inputs. Testing shows the presence of bugs, never their absence.

Formal verification takes a different approach. Instead of checking examples, you write a mathematical statement — a theorem — that describes what the software should do for all inputs. Then a computer checks that the proof holds. If it does, you have a guarantee that covers every possible case.

Consider Heartbleed, the OpenSSL vulnerability that exposed sensitive data across the internet for over two years. The bug was a missing bounds check — the code read past the end of a buffer. Standard testing missed it because the bug only triggered under specific conditions. A formal specification of the buffer's access bounds would have made this class of bug impossible.

How formal verification works

The process has three parts:

Specification — You write a precise mathematical description of what the software should do. For a sorting function, the specification says: the output is a permutation of the input, and every element is less than or equal to the next.
Implementation — You write the actual code, often in the same language as the specification (like Lean 4) or in a target language (like C) that gets modeled formally.
Proof — You construct a proof that the implementation satisfies the specification. A proof assistant mechanically checks every step. No human judgment is required for the checking — either the proof is valid or it isn't.

Think of it as type systems taken to their logical extreme. A type system proves simple properties (this variable is an integer, not a string). Formal verification proves arbitrary properties (this function never returns a negative number, this protocol never deadlocks, this compiler preserves program semantics).

Two approaches

Formal verification splits into two major families:

Model checking exhaustively explores every possible state a system can reach. It's automated — you describe the system and the property, and the tool checks all states. The downside: it can only handle finite state spaces, so it works best for hardware and protocols with bounded behavior.

Theorem proving constructs a mathematical proof that a property holds for all inputs. It can handle infinite state spaces and arbitrary complexity, but it requires human guidance (or increasingly, AI guidance) to build the proof. This is the approach used by Lean, Coq, and Isabelle.

Proof assistants: the tools

Lean 4

General-purpose language and proof assistant. Dominant in practice with 210K+ formalized theorems in Mathlib. AI tools like Leanstral and DeepSeek-Prover target it directly.

Coq / Rocq

Mature proof assistant used to verify the CompCert C compiler. Recently rebranded to Rocq. Strong in program verification and extraction.

Isabelle

Used for the landmark seL4 microkernel verification. Powerful automation via sledgehammer tactic. Widely used in academic research.

Agda

Dependently typed language closely related to Haskell. Focuses on constructive mathematics and type-driven development.

Why Lean 4 is winning: Unlike older proof assistants designed primarily for mathematicians, Lean 4 is a real programming language — you can write executable code and prove properties about it in the same file. Its Mathlib library contains over 210,000 formalized theorems, making it the largest single repository of machine-checked mathematics. Crucially, AI models like Leanstral and DeepSeek-Prover can read and write Lean, which means the cost of building proofs is dropping fast.

Key terms

Theorem: A mathematical statement about a program's behavior that can be mechanically checked. Example: "for all inputs x and y, sort(x ++ y) equals sort(y ++ x)."
Proof: A formal argument that a theorem holds. In proof assistants, proofs are programs — if they compile, they're correct.
Specification: A precise description of what a program should do, written in formal logic. The "contract" that the implementation must satisfy.
Invariant: A property that always holds at a certain point in a program. Example: "the balance is never negative" or "the list is always sorted."
Tactic: A proof step in an interactive proof assistant. Tactics transform the current proof goal into simpler sub-goals until all are resolved.
Dependent types: Types that depend on values. A vector of length n is a dependent type — the type itself encodes that the length is exactly n, not just "some number."
Soundness: A proof system is sound if it only proves true statements. If the proof checker accepts your proof, the theorem is actually true.
Model checking: Automated exhaustive exploration of every reachable state. Works well for finite systems like hardware or bounded protocols.
Theorem proving: Constructing a mathematical proof that a property holds universally. More powerful than model checking but requires guidance to build the proof.

Why now: the AI catalyst

Formal verification has existed for decades. What's changed is the economics.

Verifying the seL4 microkernel — 8,700 lines of C — required 20 person-years of effort and 200,000 lines of proof code. That's roughly 23 lines of proof for every line of implementation. At that cost, verification was reserved for microkernels, flight controllers, and cryptographic libraries.

AI is changing two things simultaneously. First, it's generating code faster than humans can review it — 41% of all code is now AI-generated, rising to 70–90% at leading AI labs. This makes formal verification more necessary. Second, AI can now generate proofs. Models like Leanstral and DeepSeek-Prover-V2 write Lean 4 proofs at a fraction of the cost of human proof engineers.

Proof checkers are uniquely suited for AI output. Unlike code review, where a human must judge correctness, a proof checker automatically rejects invalid proofs. It doesn't matter if the AI "hallucinated" — the proof either checks or it doesn't.

The cost of verification is dropping from person-years to dollars. The question is no longer can we verify software, but what gets verified first.

Keep going

See verified software in production → Browse learning resources →