Cross-Model Watermark Transferability and Defense Mechanisms in Open-Weight Language Models

Zhenyu Qiao

doi:10.71465/ajml3556

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Cross-Model Watermark Transferability and Defense Mechanisms in Open-Weight Language Models

Articles

Published 2026-01-31

Zhenyu Qiao⁺⁻

https://doi.org/10.71465/ajml3556

Zhenyu Qiao

Department of Computer Science, Rutgers University, USA

PDF

Keywords

watermark transferability
open-weight language models
defense mechanisms
adversarial robustness
model authentication
intellectual property protection

Abstract

The proliferation of open-weight large language models (LLMs) has democratized access to advanced natural language processing capabilities while simultaneously introducing significant challenges in content authentication and intellectual property protection. This study investigates the phenomenon of watermark transferability across different model architectures and examines the efficacy of defense mechanisms against watermark removal attacks in open-weight LLMs. We propose a novel hierarchical watermarking framework that distributes signature information across multiple architectural layers, analogous to multi-source data integration systems. Our methodology combines statistical watermark detection techniques with adversarial robustness testing to quantify watermark survival rates under various fine-tuning scenarios. Experimental results demonstrate that watermarks embedded using our hierarchical approach exhibit varying degrees of transferability, with an average retention rate of 73.4% across architecture boundaries when subjected to moderate fine-tuning procedures. We analyze the relationship between model performance preservation and watermark detectability, revealing a positive correlation where higher-performing architectures maintain stronger watermark signatures during transfer. Our multi-layered defense strategy incorporating redundant watermark embedding and adaptive verification mechanisms achieves 91.2% detection accuracy against sophisticated removal attempts. The findings reveal critical vulnerabilities in existing watermarking schemes and provide actionable insights for developing more robust authentication systems in the era of openly accessible LLMs.

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.