Abstract
Deep neural networks have demonstrated remarkable proficiency across a spectrum of complex tasks, ranging from computer vision to natural language processing. However, these systems exhibit a critical vulnerability to adversarial examples—inputs intentionally perturbed by
imperceptible noise that induce confident but erroneous predictions. This paper addresses the challenge of fortifying deep learning models against such adversarial threats through a hybrid approach combining robust optimization and gradient regularization. We propose a methodological framework that integrates min-max adversarial training with a Jacobian-based regularization term, designed to linearize the loss landscape and suppress the sensitivity of the model to input variations. By penalizing the Frobenius norm of the input gradients during the training phase, our approach explicitly enforces local smoothness of the decision boundary,
thereby complementing the empirical robustness gained through adversarial training. We provide a comprehensive theoretical analysis of how gradient masking is avoided and demonstrate through extensive experimentation that this dual strategy yields superior robustness against projected gradient descent attacks while maintaining high classification accuracy on clean data. Our findings suggest that constraining the curvature of the decision manifold is a necessary condition for achieving verifiable robustness in high-dimensional feature spaces.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 Hugo Robert , Manon Richard (Author)