Modeling Sensor Uncertainty in Cross-Modal Contrastive Learning for Pedestrian Re-Identification
pdf

Keywords

Pedestrian re-identification
cross-modal learning
contrastive learning
sensor uncertainty
autonomous driving

Abstract

Pedestrian re-identification in autonomous driving is affected by heterogeneous sensor  conditions and modality-dependent noise. Building on CLIP-based uncertainty-aware modeling,  this paper presents a cross-modal contrastive learning framework that aligns visual and textual pedestrian representations while accounting for sensor uncertainty. Modality-specific confidence  weights are introduced to reduce the influence of unreliable features during representation  learning. The method is evaluated on three autonomous driving datasets, including nuScenes-ReID  and two large-scale urban benchmarks, comprising more than 120,000 pedestrian samples and  45,000 identities. Comparisons are conducted against recent vision-only and vision–language  baselines, including PCB, MGN, TransReID, and CLIP-based ReID models. Experimental results  show improvements of 3.9%–5.0% in rank-1 accuracy and 4.2%–5.6% in mean average precision  under low-visibility and adverse weather conditions. 

pdf
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright (c) 2026 ames R. Walker, Emily A. Thompson, Daniel K. Hughes (Author)