Document detail
ID

oai:arXiv.org:2410.04190

Topic
Computer Science - Cryptography an... Computer Science - Computation and...
Author
Dong, Yiting Shen, Guobin Zhao, Dongcheng He, Xiang Zeng, Yi
Category

Computer Science

Year

2024

listing date

10/9/2024

Keywords
language scalable llms jailbreak models attack
Metrics

Abstract

Large Language Models (LLMs) remain vulnerable to jailbreak attacks that bypass their safety mechanisms.

Existing attack methods are fixed or specifically tailored for certain models and cannot flexibly adjust attack strength, which is critical for generalization when attacking models of various sizes.

We introduce a novel scalable jailbreak attack that preempts the activation of an LLM's safety policies by occupying its computational resources.

Our method involves engaging the LLM in a resource-intensive preliminary task - a Character Map lookup and decoding process - before presenting the target instruction.

By saturating the model's processing capacity, we prevent the activation of safety protocols when processing the subsequent instruction.

Extensive experiments on state-of-the-art LLMs demonstrate that our method achieves a high success rate in bypassing safety measures without requiring gradient access, manual prompt engineering.

We verified our approach offers a scalable attack that quantifies attack strength and adapts to different model scales at the optimal strength.

We shows safety policies of LLMs might be more susceptible to resource constraints.

Our findings reveal a critical vulnerability in current LLM safety designs, highlighting the need for more robust defense strategies that account for resource-intense condition.

Dong, Yiting,Shen, Guobin,Zhao, Dongcheng,He, Xiang,Zeng, Yi, 2024, Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Skin cancer prevention behaviors, beliefs, distress, and worry among hispanics in Florida and Puerto Rico
skin cancer hispanic/latino prevention behaviors protection motivation theory florida puerto rico variables rico psychosocial behavior response efficacy levels skin cancer participants prevention behaviors spanish-preferring tampeños puerto hispanics