About LatentEval

LatentEval is an independent research property on the reliability of AI-agent systems. We publish meta-analyses of what the field has actually shown, original benchmarks and experiments on real-world multi-agent systems, and plain-language write-ups that make that evidence usable for the people building and relying on these systems. We show our methods, cite our sources, and report uncertainty honestly without false precision.

What we publish

Each piece answers a single question: how far a fault propagates across a multi-agent topology, whether a reported eval difference is statistically real, how much variance a benchmark actually has. The work takes three forms. Meta-analyses pool what published studies have actually shown, with the inclusion criteria and effect sizes stated. Original benchmarks and experiments measure real-world multi-agent systems under controlled, repeatable conditions with statistical rigor. Plain-language write-ups, including the glossary and the Everyday AI and AI for Builders collections, make that literature readable for engineers, leads, and practitioners as well as researchers.

Our honesty contract

We hold ourselves to four standing promises, and we won't publish work that breaks one of them.

  • The finding first. Every piece leads with its finding. The method, the assumptions, and the fine print sit directly below it.
  • The work up front. We do not place ads or advertising trackers between you and the work.
  • We show our work. Every claim names its method and cites its sources. Where we report a reliability number, we show the uncertainty behind it. For work that ages (models, harnesses, baselines), we note when we last reviewed it.
  • Nothing invented. We never fabricate results, citations, or credentials. Where the evidence is thin, we say so plainly.

Independence

Independence is the point. LatentEval takes no advertising and accepts no arrangement that would give us a reason to make one system's reliability look better or worse than the numbers say. Reliability findings are only worth anything if the people publishing them have nothing riding on the result. If our funding model ever changes, we'll say so plainly on this page before it happens.

Privacy

We keep all data usage transparent and upfront. The only measurement today is aggregate analytics, described plainly on our privacy page. If what we collect ever changes, that page is updated first.

How we show our work

We believe a result you can't check isn't worth trusting. Our methodology page explains how we choose methods for each analysis, how sources are pinned to a publisher, a direct link, and a retrieval date, and how often we review published numbers as models and harnesses change. Our benchmarks and experiments are built to measure what ordinary testing misses: how failures propagate across agents, and whether a difference is statistically real. See our disclaimer for how to read a finding and the uncertainty around it.

Contact

Found a mistake or an outdated number? Email us at contact@latenteval.ai and we'll take a look.

LatentEval is independently built and maintained. Agent reliability is a fast-moving field, and we hold our own numbers to the standard we hold everyone else's: versioned methods, cited sources, and uncertainty reported honestly. When a call is close, our disclaimer explains how to read it.