Research Article 2026-04-21 under-review v1

Green Decoding: ELQ Co-Optimization and Carbon-Aware Scheduling for Efficient LLM Inference

G
Gaith Rjoub Aqaba University of Technology
J
Jamal Bentahar Khalifa University of Science and Technology
S
Shahed Almobydeen Aqaba University of Technology
A
Ayoub Alsarhan Hashemite University

Abstract

The large-scale deployment of Large Language Models (LLMs) is constrained by significant energy consumption and operational costs, with inference accounting for up to 90% of the total energy footprint. Existing optimization methods typically address latency or memory independently, frequently overlooking energy efficiency, carbon impact, and their intricate relationship with user-defined Service Level Agreements (SLAs) for quality and response time. This work presents Green Decoding, a novel co-optimization framework for LLM inference. Green Decoding formulates inference as a multi-objective optimization problem, minimizing a weighted function of Energy, Latency, and Quality (ELQ). The framework utilizes a policy engine that, on a per-request basis, jointly tunes a broad set of system parameters, including speculative decoding configurations, dynamic Key-Value (KV) cache policies, adaptive quantization tiers, and early-exit criteria. The framework introduces two key contributions: (1) a carbon-aware scheduler that leverages real-time grid carbon intensity data to strategically time-shift deferrable, non-interactive workloads to periods of cleaner energy, thereby directly reducing CO2 emissions without violating SLAs, and (2) 1 a safety-aware gating mechanism that employs runtime uncertainty and toxicity signals to limit aggressive, potentially quality-degrading optimizations, thereby ensuring model reliability. Across diverse workloads, Green Decoding demonstrates superior performance, establishing a more efficient ELQ Pareto frontier. The framework achieves up to 35% energy reduction and 50% lower carbon emissions (gCO2e) compared to highly optimized static baselines, while strictly adhering to p95 latency and quality-proxy SLAs.

Citation Information

@article{gaithrjoub2026,
  title={Green Decoding: ELQ Co-Optimization and Carbon-Aware Scheduling for Efficient LLM Inference},
  author={Gaith Rjoub and Jamal Bentahar and Shahed Almobydeen and Ayoub Alsarhan},
  journal={Evolutionary Intelligence},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9252320/v1}
}
Back to Top
Home
Paper List
Submit
0.023677s