Failure attribution (agents)

Failure attribution is the diagnostic that traces an observed multi-agent cascade back to the agent and step whose output first went wrong (the root cause), by reconstructing the run as a cascade tree and isolating its earliest corrupted node.

Failure attribution is the diagnostic that reconstructs where a multi-agent cascade began: given a failed run, it traces the observed symptom back to the agent and step whose output first went wrong, the root cause. Error propagation asks how far a fault reaches; attribution asks where it started.

The node where an error surfaces is almost never the node that produced it.

The operational move is a cascade tree: each node is an agent’s output, each edge records which output the next agent consumed, and the failed result sits at a leaf. Root-cause isolation walks that tree upstream to the earliest node already carrying the error, separating the originating fault from every hop that only relayed it. That earliest corrupted node is the attribution target.

Attribution runs blast radius in reverse, and a low containment rate is what makes the reverse walk long. Automating it is still an open problem: on the Who&When dataset of failure logs from 127 multi-agent systems, even state-of-the-art reasoning models fail to reach practical usability at naming the responsible agent and the decisive step (Zhang et al., 2025, arXiv:2505.00212). Each attributed root cause is then classified against the failure-mode taxonomy.