Research Article 2026-04-21 under-review v1

Who Broke the Pipeline? Traceability andAccountability in Multi-Agent LLM SoftwareEngineering Pipelines

J
Jackson Beem Oakland University
A
Amine Barrak Oakland University
E
Emna Ksontini University of North Carolina Wilmington

Abstract

Large language models are increasingly embedded across the DevOps pipeline, from planning and code generation to testing and deployment, yet multi-agent LLM pipelines remain opaque: errors propagate silently across stages with no systematic method to attribute failures to specific agents. We present a two-part study of traceable, accountable multi-agent pipelines with structured handoffs and role-level blame attribution. Part I evaluates a Planner, Executor, and Critic pipeline on three multiple-choice benchmarks using eight configurations of three frontier LLMs. Structured accountability recovers accuracy dramatically over unstructured pipelines (e.g., from 61.42% to 97.64%), and blame attribution reveals complementary role-specific aptitudes that enable data-driven model to role casting. Part II extends the pipeline to execution-grounded code generation on HumanEval, MBPP, and BigCodeBench, introducing a fourth Verifier agent. We find that LLM-based test evaluation is not a reliable substitute for actual test execution: false-fail rates reach 93%, triggering unnecessary re-generation that overwrites correct solutions. Replacing the LLM Critic with an execution-based verification harness eliminates verification harm entirely, achieves the highest accuracy on complex tasks, and reduces token cost by up to 57%. Test execution remains an irreplaceable component of the DevOps pipeline that cannot be delegated to LLM judgment alone. We further find that the Planner adds 1.8 to 3.0 times more tokens to the pipeline without meaningful accuracy gains in code generation, as the Executor produces comparable output with or without the Planner's formal specification.

Citation Information

@article{jacksonbeem2026,
  title={Who Broke the Pipeline? Traceability andAccountability in Multi-Agent LLM SoftwareEngineering Pipelines},
  author={Jackson Beem and Amine Barrak and Emna Ksontini},
  journal={Automated Software Engineering},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9350097/v1}
}
Back to Top
Home
Paper List
Submit
0.023942s