Abstract
Clinical deployment of learned CT reconstruction models requires attention to runtime, memory footprint, and throughput. Building on convolution–self-attention architectures such as CTLformer, this paper introduces lightweight convolution–attention blocks with structured sparsity and low-rank attention approximation. The design reduces quadratic attention costs while keeping long-range dependency modeling. Experiments are conducted on two CT reconstruction datasets totaling 40,000 slices. Compared with transformer-heavy reconstruction models and standard hybrid baselines, the proposed approach reduces GPU memory usage by 28%–40% and improves inference throughput by 1.6×–2.1×, while keeping reconstruction accuracy within 0.2–0.4 dB PSNR of the full model.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 Jeroen van Dijk, Anneke Smit, Thomas de Vries (Author)