Structural Prompt Variants and Their Impact on LLM-Based Vulnerability Detection Accuracy

Mariana Torres; Javier Martínez; Alejandro Ruiz

doi:10.71465/ajainn3632

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Structural Prompt Variants and Their Impact on LLM-Based Vulnerability Detection Accuracy

Articles

Published 2026-02-28

Mariana Torres⁺⁻
Javier Martínez⁺⁻
Alejandro Ruiz⁺⁻

https://doi.org/10.71465/ajainn3632

Mariana Torres

Department of Computer Science, Tecnológico de Monterrey, Monterrey 64849, Mexico

Javier Martínez

Department of Computer Science, Tecnológico de Monterrey, Monterrey 64849, Mexico

Alejandro Ruiz

Department of Computer Science, Tecnológico de Monterrey, Monterrey 64849, Mexico

pdf

Keywords

prompt structure
vulnerability detection
prompt ambiguity
LLM reasoning
CWE classification

Abstract

The structure of prompts used in vulnerability detection tasks significantly affects the reasoning process of large language models. This study systematically evaluates how formatting,decomposition, and semantic density influence the accuracy of detecting five major vulnerability categories (CWE-20, CWE-79, CWE-89, CWE-120, CWE-798). We construct a benchmark of 12,000 prompt–code pairs and test three LLMs: GPT-4, Claude-3, and Llama-3-70B. Results show that structured prompts with explicit reasoning stages improve true-positive rates by 24–41%, while overly verbose prompts increase hallucination-induced false alarms by 15%. Prompt–model mismatch analysis reveals that models differ in tolerance to prompt ambiguity. These findings highlight the necessity of prompt normalization for trustworthy security auditing using LLMs.

pdf

This work is licensed under a Creative Commons Attribution 4.0 International License.