Research Article 2026-04-21 under-review v1

Deterministic Inference on Distributed AI Accelerators Interconnected by TSN

A
Abdulmajeed Alhumaidi University of Siegen
Y
Yosab Bebawy University of Siegen
R
Roman Obermaisser University of Siegen

Abstract

Deep neural networks (DNNs) are increasingly critical to embedded and cyber–physical systems that demand strict real-time guarantees. However, the computational intensity of modern DNNs often exceeds the capacity of individual resource-constrained nodes. While distributing inference across multiple nodes offers a scalable alternative, achieving deterministic end-to-end latency remains difficult due to heterogeneous hardware, complex layer dependencies, and non-deterministic communication delays. This paper introduces a execution time(ET)-aware two-level scheduling framework designed to provide predictable, distributed DNN inference on FPGA-based accelerators interconnected via time-triggered networking. The proposed approach bridges the gap between hardware-level execution and system-level coordination. At the lower level, we employ cycle-accurate instruction analysis to derive tight Worst-Case Execution Time (WCET) bounds for individual neural network layers. At the higher level, a static scheduling algorithm maps tasks to heterogeneous processors and allocates communication slots while respecting data dependencies and deterministic network constraints. The framework supports both layer-level and block-level scheduling, enabling flexible task decomposition to exploit inter-layer and intra-layer parallelism. At the higher level, a static scheduling algorithm maps these layers to heterogeneous processors and allocates communication slots, ensuring all data dependencies and network constraints are met deterministically. To validate the framework, we developed a cycle-accurate simulator of the Time Triggered-Versatile Tensor Accelerator (TT-VTA) using SystemC and implemented time-sensitive networking in OMNeT++. Evaluations using representative CNN models demonstrate that the proposed framework enables predictable, periodic inference with reduced end-to-end latency compared to single-node execution, consistently approaching the theoretical lower bound. Furthermore, block-level scheduling improves load balancing, increases processor utilization, and eliminates anomalous scheduling behavior observed under coarse-grained partitioning. These results indicate that ET-aware distributed scheduling provides the assured performance necessary for safety-critical real-time applications.

Citation Information

@article{abdulmajeedalhumaidi2026,
  title={Deterministic Inference on Distributed AI Accelerators Interconnected by TSN},
  author={Abdulmajeed Alhumaidi and Yosab Bebawy and Roman Obermaisser},
  journal={Real-Time Systems},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9246074/v1}
}
Back to Top
Home
Paper List
Submit
0.025103s