Leanstral and the Economics of Proof

In December 2025, Martin Kleppmann made a prediction: AI will make formal verification mainstream. His argument was simple — the cost model that kept verification niche is about to break. Three months later, the evidence is piling up.

The cost problem

Formal verification has always been expensive. The canonical example: verifying the seL4 microkernel — 8,700 lines of C — required 20 person-years of effort and 200,000 lines of Isabelle proof code. That’s roughly 23 lines of proof and half a person-day for every single line of implementation.

At that ratio, verification is reserved for the highest-stakes software: microkernels, cryptographic libraries, flight controllers. Everything else ships with tests and hope.

Leanstral: 92x cheaper

On March 16, 2026, Mistral released Leanstral — the first open-source code agent designed specifically for Lean 4. The numbers are striking:

Architecture: 120B total parameters, but only 6B active (highly sparse). Optimized for proof engineering, not general chat.
Performance: Outperforms models 6x its size (GLM5-744B, Kimi-K2.5-1T) on FLTEval, the benchmark for formal Lean theorem proving.
Cost: Pass@2 score of 26.3 for $36, versus Claude Sonnet’s 23.7 at $549. At pass@16, Leanstral reaches 31.9 versus Claude Opus’s 39.6 at $1,650 — getting 80% of frontier performance at 92x lower cost.
License: Apache 2.0. Weights available for download. Integrated into Mistral Vibe with MCP support for lean-lsp-mcp.

This isn’t a research demo. It’s a shipping tool with an API endpoint (labs-leanstral-2603) that anyone can call today.

DeepSeek-Prover-V2: the trend is not a single vendor

Leanstral isn’t an isolated event. In April 2025, DeepSeek released Prover-V2, an open-weight model for formal theorem proving in Lean 4:

88.9% pass ratio on MiniF2F-test
Solved 49 out of 658 PutnamBench problems
Uses reinforcement learning with subgoal decomposition to break complex proofs into tractable pieces
Available in two sizes: 7B and 671B parameters

When multiple independent teams — Mistral, DeepSeek, the LeanDojo group at Caltech — are converging on AI-powered formal verification, it’s not a trend. It’s infrastructure being built.

What changes

Kleppmann’s framing remains the clearest: AI creates the problem (code generated faster than humans can review) and the solution (proofs generated faster than humans could write). Proof checkers are uniquely suited for AI outputs because they reject invalid proofs automatically — no human judgment required.

The question is no longer can AI do formal verification? It’s how cheap can AI make formal verification? And the answer is dropping fast enough to change what gets verified.

At $36 per proof attempt, the calculus shifts. Libraries that were never worth verifying at 20-person-year prices become tractable. The verification stack — currently limited to cryptography, core libraries, and authorization engines — starts expanding downward into parsers, protocol implementations, and storage engines.

We’re watching the cost curve that kept formal verification elite collapse in real time.