Abstract
The rapid proliferation of Large Language Models (LLMs) has created unprecedented challenges in content authentication and model attribution. Traditional statistical fingerprinting methods primarily rely on token-level distribution patterns, which are increasingly vulnerable to adversarial manipulation and model fine-tuning. This research introduces a novel multi-layered fingerprinting framework that extends beyond superficial token distributions to incorporate syntactic structures and discourse-level signatures for robust LLM authentication. We propose a hierarchical feature extraction methodology that captures deep linguistic patterns including dependency parsing structures, semantic coherence measures, and discourse relationship markers. Our experimental evaluation across multiple state-of-the-art LLMs demonstrates that syntactic and discourse-level features provide significantly higher discriminative power compared to traditional token-based approaches, achieving 94.7% authentication accuracy even under adversarial conditions. The framework introduces three key innovations: syntactic tree embedding for structural fingerprinting, discourse graph construction for rhetorical pattern analysis, and temporal coherence modeling for stylistic consistency verification. Results indicate that combining multiple linguistic levels creates robust signatures resistant to common evasion techniques while maintaining computational efficiency for real-world deployment. This work establishes foundational principles for next-generation LLM authentication systems that can adapt to evolving model architectures and generation strategies.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright (c) 2026 Yongjie Lin, Zihan Cui, Alexander Monroe (Author)