Green Decoding: ELQ Co-Optimization and Carbon-Aware Scheduling for Efficient LLM Inference
Abstract
The large-scale deployment of Large Language Models (LLMs) is constrained by significant energy consumption and operational costs, with inference accounting for up to 90% of the total energy footprint. Existing optimization methods typically address latency or memory independently, frequently overlooking energy efficiency, carbon impact, and their intricate relationship with user-defined Service Level Agreements (SLAs) for quality and response time. This work presents Green Decoding, a novel co-optimization framework for LLM inference. Green Decoding formulates inference as a multi-objective optimization problem, minimizing a weighted function of Energy, Latency, and Quality (ELQ). The framework utilizes a policy engine that, on a per-request basis, jointly tunes a broad set of system parameters, including speculative decoding configurations, dynamic Key-Value (KV) cache policies, adaptive quantization tiers, and early-exit criteria. The framework introduces two key contributions: (1) a carbon-aware scheduler that leverages real-time grid carbon intensity data to strategically time-shift deferrable, non-interactive workloads to periods of cleaner energy, thereby directly reducing CO2 emissions without violating SLAs, and (2) 1 a safety-aware gating mechanism that employs runtime uncertainty and toxicity signals to limit aggressive, potentially quality-degrading optimizations, thereby ensuring model reliability. Across diverse workloads, Green Decoding demonstrates superior performance, establishing a more efficient ELQ Pareto frontier. The framework achieves up to 35% energy reduction and 50% lower carbon emissions (gCO2e) compared to highly optimized static baselines, while strictly adhering to p95 latency and quality-proxy SLAs.
Keywords
Citation Information
@article{gaithrjoub2026,
title={Green Decoding: ELQ Co-Optimization and Carbon-Aware Scheduling for Efficient LLM Inference},
author={Gaith Rjoub and Jamal Bentahar and Shahed Almobydeen and Ayoub Alsarhan},
journal={Evolutionary Intelligence},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9252320/v1}
}
SinoXiv