Propagation & containment

Error propagation and cascade containment in agent systems

OWASP's ASI08 files cascading failures under security. Reframed as error-propagation engineering, one fault becomes two measurable quantities, blast radius and containment rate, mapped to topologies.

By LatentEval Published 2026-04-01 Updated 2026-05-25

Cascading failures are filed as a security problem. They are a reliability problem. OWASP’s ASI08 lands them in a security taxonomy, and that shelf hands teams a toolkit built for an attacker: a checklist, a red-team pass, a hardened perimeter. Most cascades in a working agent system have no attacker behind them, just one wrong-but-plausible output that the next agent trusts. Reliability engineering measures it instead: how far a single fault travels across your agents, its blast radius, and what fraction of faults the architecture holds to one hop, its containment rate. This page reframes ASI08 as error propagation and gives you those two numbers, and the levers, that a security checklist leaves out.

Cascading failure is error propagation the security frame mis-files

OWASP’s Top 10 for Agentic Applications names cascading failures as a distinct risk class, ASI08, and illustrates it in one line: “False signals cascaded through automated pipelines with escalating impact (ASI08 – Cascading Failures)” (OWASP Top 10 for Agentic Applications, as of 2025-12). Naming the class is the right move. The classification is where the engineering response drifts. Sitting inside a security taxonomy, ASI08 inherits security’s default toolkit: authentication, input sanitization, perimeter hardening, an adversary to red-team against. That toolkit assumes a malicious trigger, and most production cascades have none: an agent emits a plausible, well-formed, wrong output; the next agent trusts it; the error rides downstream with escalating impact. A cascade is not a breach; it is error propagation, a reliability property of how the agents are wired.

Each row takes a facet of ASI08, reads it the way a security control would, then re-reads it as error propagation, and states which reading hands you a lever you can actually pull.

Facet	Security-control reading	Error-propagation reading	Judgment: where the lever is
Trigger	An adversary crafts input to set off the cascade	Any single fault, accidental or adversarial, that a downstream agent trusts	The accidental fault is the common case, so a trigger model built only for attackers under-covers the risk
What gets through	A malformed or malicious payload evades a filter	A well-formed, correctly typed, confidently wrong output satisfies validation	Structural validation cannot see a semantic error, so perimeter filtering misses the dominant path
Where you intervene	Authentication, sanitization, boundary hardening	Containment levers at every hop: circuit-break, isolate, bound the correction	The propagation frame gives you a control at each internal edge, not only at the perimeter
Success criterion	The checklist is complete and the red-team found nothing	Fault injection reports a containment rate with a confidence interval	”No finding” is an absence of evidence; a containment rate is a measurement
The metric	Typically none; ASI08 is a control to satisfy	Blast radius and containment rate, read per topology	Two numbers you can hold an architecture to across versions

Read the judgment column top to bottom. The security reading holds on its own terms, and you should keep it; an adversary who can deliberately poison a shared store is a real threat. It is only partial, and the part it omits is the part that fires most often. The dominant production cascade starts with a capable agent returning something fluent and wrong, and a downstream agent with no reason to doubt it. Everything below quantifies that path and bounds it.

How one fault becomes many: the amplification arithmetic

Cascading is amplification, and amplification is what turns a fault into a failure. A fault that reaches one hop and stops is an incident you log. The same fault that reaches ten agents is an outage. What decides which one you get is rarely the fault’s severity. It is the containment between hops, and you can model it directly.

Treat propagation as a race between a fault moving forward and a gate catching it. Suppose every hop has an independent probability c of containing a fault that arrives at it: a validation gate, a type check, an idempotent step that absorbs a repeat, a circuit breaker that trips. Then the fraction of faults held to a single hop, the containment rate, is c. The number of hops a fault survives before a gate stops it follows a geometric distribution, so its expected reach is 1/c hops, capped by the length of the path. The table below is computed from those stated assumptions, not measured on any system.

Per-hop containment c	Containment rate (held to one hop)	Expected hops reached (1/c)	Reach across a 10-hop chain
0.9	0.90	1.1	about 1.1 of 10
0.7	0.70	1.4	about 1.4 of 10
0.5	0.50	2.0	about 2 of 10
0.3	0.30	3.3	about 3.3 of 10
0.1	0.10	10	all 10

The curve is steep where it matters. At c = 0.9 a fault touches barely more than the hop it started on. Let containment slip to 0.5 and the average fault reaches two agents; drop it to 0.1 and a single fault owns the entire chain. Containment is where the return on effort concentrates. Raising each agent’s task accuracy shrinks how often a fault starts; raising c shrinks how far each fault travels once it does, which is the quantity ASI08 is actually about. That is why so much reliability effort should go into the gates between agents before another point of per-agent accuracy, and why measuring how far a single fault reaches is the prerequisite for either.

Blast radius has two dimensions, and the arithmetic above tracks only one. Depth is how many hops a fault survives along a path, the quantity 1/c estimates. Breadth is how many agents a fault touches at a single hop, which the chain model pins at one but a hub or a shared store does not. A poisoned return in an orchestrator can reach every worker the hub dispatches in the same round, so its breadth is the fan-out width while its depth stays shallow. The two dimensions ask for different levers, and collapsing blast radius into one scalar hides which of them is hurting you. Report it as depth and breadth separately, per injection point.

The model leans on two assumptions, and dropping either one worsens the real picture. The first is independence between hops. Shared state breaks it: when several agents read the same surface, a single corrupt entry correlates their faults, so the geometric estimate understates reach in blackboard designs, where a fault lands across the whole topology in one hop no matter what c is. The second assumption is that c holds steady, and it does not. It varies by hop and, more sharply, by fault type. A crash meets a high effective c, because a structural gate catches it. A confidently wrong answer meets a much lower one, because it is well-formed enough to pass every structural gate untouched. That second point is the mechanism behind semantic propagation, and it earns its own section.

Why a wrong answer passes every gate

The reason semantic errors propagate is that the gates most systems ship were built to catch a different kind of fault. Three mechanics do the damage, and each explains a column of the reframe table.

First, structural gates cannot read meaning. Schema validation, type systems, retries, and output parsers all check the shape of an output while its truth goes unchecked. A JSON payload with the right keys and wrong values passes every structural check. The gate designed to stop a crash waves through a confident mistake with no complaint, because nothing about the mistake is malformed.

Second, confidence does not survive the hop. An upstream agent’s output arrives at the next agent stripped of its uncertainty. A hedged guess and a verified fact read identically once they cross the boundary, so the receiver treats both as ground truth. Provenance and calibrated confidence are the first casualties of an inter-agent hand-off, which is exactly why a corrupted message becomes a trusted premise one hop later.

Third, aggregation can certify a shared error. In debate and voting topologies, several agents seeded from the same context and prompt are correlated samples that share one blind spot. When the answer they share is wrong, the tally over those samples reports agreement, and that agreement reads as correctness. The mechanism meant to raise reliability confirms the error.

This is not a hypothesis about how systems might fail. MAST, the first empirically grounded taxonomy of multi-agent LLM failures, derived its 14 failure modes in 3 categories from 150 expert-annotated traces (kappa 0.88), then scaled annotation to a 1,600+ trace dataset across 7 frameworks with an LLM-as-judge pipeline (Cemri et al., Why Do Multi-Agent LLM Systems Fail?, arXiv:2503.13657, preprint, as of 2026-07). Its second category, inter-agent misalignment, is defined as “[f]ailures arising from ineffective communication, poor collaboration, conflicting behaviors among agents, and gradual derailment from the initial task.” That category is error propagation under another name: the fault lives in the hand-off from one agent to the next, in what gets passed across the edge. MAST also reports that targeted interventions on the top failure modes did not fully resolve them, which points the fix at structural containment rather than patching one mode at a time.

Picture a planner agent that emits a schema-valid task specification with a wrong assumption baked into one field, say a date range or a unit of measure. Every downstream tool call validates the shape of that field and executes without objection, because nothing is malformed. The wrong assumption is now the workflow’s ground truth, and the final output is precise, fluent, and wrong. A security control tuned for malicious input never engages, because no input was malicious. Only a check on the meaning of that field, at the hop where it enters, would have caught it. The gate you need is semantic, and it belongs on the edge between agents.

Recovery makes this class of fault worse, which is the reversal most teams miss. Retrying a step that crashed is safe. Retrying a step that returned a confident wrong answer re-runs the same mis-specified process, and if the step is not idempotent it produces a second wrong answer that differs from the first, so a later agent inherits two authoritative versions with no way to tell which is canonical. The instinct that hardens a service against transient faults, retry and move on, quietly multiplies semantic faults in an agent chain. Containment at the hop is the move that helps, because it stops the wrong answer from being read downstream at all.

The containment patterns that port from distributed systems

The levers that bound propagation are not new, and treating them as novel is how agent pipelines end up with none of them. Distributed systems have contained cascading failures across unreliable service graphs for years, and the patterns port to agent topologies with the nouns changed. The circuit breaker is the canonical case. You “wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all” (Fowler, CircuitBreaker, as of 2026-07). Swap the protected function for an agent or a tool and the pattern is intact.

The table maps five standard resilience patterns from distributed-systems practice to their agent-system translation and to the propagation path each one caps.

Pattern (distributed-systems practice)	What it does in a service graph	Agent-system translation	Where it earns its place (judgment)
Circuit breaker	Trips a failing dependency so callers stop hitting it	Orchestrator stops routing work to an agent or tool returning low-confidence or malformed output	Hub-and-spoke fan-out, where one failing worker would otherwise reach every sibling
Bulkhead	Isolates resource pools so one saturated dependency cannot sink the rest	Scope shared state and tool access into failure domains so a poisoned write stays local	Shared-state designs, where one write is read by the whole topology
Idempotency	A repeated call has no additional effect	An agent step safe to re-run, so retries and loops do not fork state	Loops and retries, where re-entry manufactures divergent versions of a result
Backpressure	A slow consumer signals upstream to slow down	A low-confidence or overloaded agent halts commitment upstream instead of forwarding a guess	Chains, where an early low-confidence output would be consumed as fact
Bounded error-correction	Caps retries so recovery cannot amplify	Caps how many correction hops the system attempts before it escalates or halts	Self-repair and debate loops, where unbounded correction becomes its own cascade

A model reads as a competent collaborator rather than a flaky dependency, so the isolation and breakers a service team treats as mandatory get skipped as ceremony. That intuition is where the outages come from. A component that returns a confident, well-formed, wrong answer under load is exactly a flaky dependency, and it needs the same containment you would wrap around any other one. Every one of these patterns predates LLM agents by a decade. What is new is the fault they now have to contain, a semantic error that looks healthy to every structural check, which is why the agent translations add confidence signals and meaning checks on top of the classic liveness and timeout signals.

The five patterns share a limit worth stating plainly: each one contains a fault it can detect, and none of them detects a semantic error on its own. A circuit breaker trips on a signal, a bulkhead needs a boundary, backpressure needs a confidence or load reading. Supplying that signal is the sixth lever, the one both the security frame and the classic resilience patterns leave implicit: an independent acceptance gate that checks the meaning of an output before a downstream agent reads it. It can be a verifier the producing agent cannot overrule, a check of a claim against its cited source, or a typed contract that carries confidence and provenance across the hop so the receiver sees a hedge as a hedge. Without a semantic gate feeding them, the other five wait on a signal that a confident wrong answer never sends.

Which lever fits which amplification path

A pattern is only a lever when it matches the amplification path of the wiring it sits in. Buying a circuit breaker for a system whose real exposure is a shared blackboard spends effort on the wrong edge. The matrix keys on the four ways a single fault amplifies, then names the wiring where each one dominates, the blast-radius dimension it drives, and the containment lever that caps it.

Amplification path	Wiring where it dominates	Blast-radius dimension it drives	Lever that caps it	Judgment: spend here when
Depth reach	Sequential pipeline (chain)	Depth grows with path length; breadth holds at one	An acceptance gate between stages; idempotent stages so a re-run cannot fork the result; backpressure that lets a low-confidence stage stall commitment upstream	Your pipeline runs many stages deep with no check between them
Breadth reach	Orchestrator-worker (hub-and-spoke)	Breadth equals the fan-out width; depth stays shallow	A meaning check on a worker return before the hub relays it; a breaker that pulls a worker from rotation once its output reads low-confidence or malformed	Your hub forwards worker output to siblings without re-checking it
Correlated convergence	Debate / voting	Reach is the panel width, and correlation, not head count, sets it	Seed each voter independently; place the verifier outside the panel; score a vote by the evidence behind it	Your panel shares a prompt or context, so its votes are not independent draws
Global reach	Shared-state / blackboard	Reach is the whole topology in one hop, independent of c	Bulkhead the store into scoped failure domains; validate on write; version reads so a stale or poisoned entry shows as one	Several agents read from and write to one mutable store

Topology fixes the upper bound on how far a fault can reach, and the levers lower it from there; the per-topology rundown of which wiring amplifies which way, and where each one’s strongest lever sits, lives in the companion analysis on why multi-agent systems fail.

Real systems are hybrids, and the matrix is a decomposition tool. A production agent app is often an orchestrator whose workers are short chains, all writing to a shared memory the orchestrator also reads. Break it into those primitives, find the amplification path each one contributes, and the containment shopping list falls out: a hub verifier and a worker breaker for the fan-out, stage gates for the chains, a bulkhead and write validation for the shared store. Spend against the paths your wiring actually has, in the order of the blast radius each one exposes.

Putting a number on the cascade: blast radius and containment rate

A reframe is worth the change only if it produces a measurement, and ASI08-as-propagation produces two. The reframe yields two measurable quantities, blast radius and containment rate, read with its interval, and one architecture-level score above them, cascade resistance.

The measurement method is fault injection: seed a known fault, watch which downstream agents it reaches, and rerun until the containment rate carries an interval. The number of injections is a sample-size question: bounding a rare escape to a narrow interval takes far more runs than confirming a common one. A single injection reports an anecdote, so a containment rate is that injection repeated until the interval is tight enough to act on.

A worked illustration on schematic inputs, to show the arithmetic and nothing beyond it: an orchestrator fans one poisoned worker result out to 8 siblings, and suppose a hub verifier catches it before redistribution in 7 of 10 injections. The containment rate is 0.70. The 3 escapes each reach a breadth of 8 agents at a depth of one hop, so the mean breadth across all 10 injections is about 2.4 agents. The containment rate and the mean breadth here exist to demonstrate the computation, not to report a measurement. Both follow from the stated assumptions, and neither earns trust on a real system until it carries an interval. Measuring how a fault propagates and holding that measurement to statistical rigor are one discipline, because a containment number without an interval decides nothing.

The reliability profiler this site points toward is a pre-launch instrument whose design intent is to place a fault, follow its propagation across a topology, and return a containment rate with a bootstrap interval. No such number has been measured, so this page shows no containment figure of its own.

From a risk-register line to a measured system

The change ASI08 asks of you is smaller than a rebuild and larger than a checklist. Keep the security controls; an adversary who can trigger a cascade on purpose is still a threat worth hardening against. Then add the propagation view on top of them, because most cascades begin with an ordinary fault and a trusting downstream agent, and no perimeter control engages when the input was never malicious. Once ASI08 reads as a measurement, the design question changes from “did we harden the perimeter” to “how far does a fault reach, and what fraction do we hold to one hop.”

A security checklist closes with a box ticked and nothing to measure. This reframe closes with two numbers and a way to produce them. The reliability glossary gives blast radius, containment rate, and cascade resistance each a canonical definition and spells out the injection recipe behind each. The failure-mode side of the same picture lives in the pillar on how multi-agent systems fail and how to contain it. Measured containment rates land in the research index as the runs complete.