oai:arXiv.org:2410.12318
Computer Science
2024
10/23/2024
Fingerprinting large language models (LLMs) is essential for verifying model ownership, ensuring authenticity, and preventing misuse.
Traditional fingerprinting methods often require significant computational overhead or white-box verification access.
In this paper, we introduce UTF, a novel and efficient approach to fingerprinting LLMs by leveraging under-trained tokens.
Under-trained tokens are tokens that the model has not fully learned during its training phase.
By utilizing these tokens, we perform supervised fine-tuning to embed specific input-output pairs into the model.
This process allows the LLM to produce predetermined outputs when presented with certain inputs, effectively embedding a unique fingerprint.
Our method has minimal overhead and impact on model's performance, and does not require white-box access to target model's ownership identification.
Compared to existing fingerprinting methods, UTF is also more effective and robust to fine-tuning and random guess.
Cai, Jiacheng,Yu, Jiahao,Shao, Yangguang,Wu, Yuhang,Xing, Xinyu, 2024, UTF:Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification