Defending Deep Learning Systems Against Adversarial Attacks via Robust Optimization and Gradient Regularization
pdf

Keywords

Adversarial Robustness
Deep Learning
Robust Optimization
Gradient Regularization

Abstract

Deep neural networks have demonstrated remarkable proficiency across a spectrum of  complex tasks, ranging from computer vision to natural language processing. However, these  systems exhibit a critical vulnerability to adversarial examples—inputs intentionally perturbed by 
imperceptible noise that induce confident but erroneous predictions. This paper addresses the  challenge of fortifying deep learning models against such adversarial threats through a hybrid  approach combining robust optimization and gradient regularization. We propose a  methodological framework that integrates min-max adversarial training with a Jacobian-based  regularization term, designed to linearize the loss landscape and suppress the sensitivity of the  model to input variations. By penalizing the Frobenius norm of the input gradients during the  training phase, our approach explicitly enforces local smoothness of the decision boundary, 
thereby complementing the empirical robustness gained through adversarial training. We provide  a comprehensive theoretical analysis of how gradient masking is avoided and demonstrate through  extensive experimentation that this dual strategy yields superior robustness against projected  gradient descent attacks while maintaining high classification accuracy on clean data. Our  findings suggest that constraining the curvature of the decision manifold is a necessary condition  for achieving verifiable robustness in high-dimensional feature spaces.

pdf
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2026 Hugo Robert , Manon Richard (Author)