Modeling Sensor Uncertainty in Cross-Modal Contrastive Learning for Pedestrian Re-Identification

ames R. Walker; Emily A. Thompson; Daniel K. Hughes

doi:10.71465/ajml3574

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Modeling Sensor Uncertainty in Cross-Modal Contrastive Learning for Pedestrian Re-Identification

Articles

Published 2026-01-31

ames R. Walker⁺⁻
Emily A. Thompson⁺⁻
Daniel K. Hughes⁺⁻

https://doi.org/10.71465/ajml3574

ames R. Walker

Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom

Emily A. Thompson

Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom

Daniel K. Hughes

Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom

pdf

Keywords

Pedestrian re-identification
cross-modal learning
contrastive learning
sensor uncertainty
autonomous driving

Abstract

Pedestrian re-identification in autonomous driving is affected by heterogeneous sensor conditions and modality-dependent noise. Building on CLIP-based uncertainty-aware modeling, this paper presents a cross-modal contrastive learning framework that aligns visual and textual pedestrian representations while accounting for sensor uncertainty. Modality-specific confidence weights are introduced to reduce the influence of unreliable features during representation learning. The method is evaluated on three autonomous driving datasets, including nuScenes-ReID and two large-scale urban benchmarks, comprising more than 120,000 pedestrian samples and 45,000 identities. Comparisons are conducted against recent vision-only and vision–language baselines, including PCB, MGN, TransReID, and CLIP-based ReID models. Experimental results show improvements of 3.9%–5.0% in rank-1 accuracy and 4.2%–5.6% in mean average precision under low-visibility and adverse weather conditions.

pdf

This work is licensed under a Creative Commons Attribution 4.0 International License.