SQL Injection Jailbreak: A Structural Disaster of Large Language Models

Document detail

ID

oai:arXiv.org:2411.01565

Topic

Computer Science - Cryptography an...

Author

Zhao, Jiawei Chen, Kejiang Zhang, Weiming Yu, Nenghai

Year

2024

listing date

3/5/2025

Keywords

prompts sij jailbreak models

Metrics

Abstract

In recent years, the rapid development of large language models (LLMs) has brought new vitality into various domains, generating substantial social and economic benefits.

However, jailbreaking, a form of attack that induces LLMs to produce harmful content through carefully crafted prompts, presents a significant challenge to the safe and trustworthy development of LLMs.

Previous jailbreak methods primarily exploited the internal properties or capabilities of LLMs, such as optimization-based jailbreak methods and methods that leveraged the model's context-learning abilities.

In this paper, we introduce a novel jailbreak method, SQL Injection Jailbreak (SIJ), which targets the external properties of LLMs, specifically, the way LLMs construct input prompts.

By injecting jailbreak information into user prompts, SIJ successfully induces the model to output harmful content.

For open-source models, SIJ achieves near 100\% attack success rates on five well-known LLMs on the AdvBench and HEx-PHI, while incurring lower time costs compared to previous methods.

For closed-source models, SIJ achieves an average attack success rate over 85\% across five models in the GPT and Doubao series.

Additionally, SIJ exposes a new vulnerability in LLMs that urgently requires mitigation.

To address this, we propose a simple defense method called Self-Reminder-Key to counter SIJ and demonstrate its effectiveness through experimental results.

Our code is available at https://github.com/weiyezhimeng/SQL-Injection-Jailbreak.

Zhao, Jiawei,Chen, Kejiang,Zhang, Weiming,Yu, Nenghai, 2024, SQL Injection Jailbreak: A Structural Disaster of Large Language Models

Document

Open

Source

Articles recommended by ES/IODE AI

Computer Science

Optimal Sensing of Momentum Kicks with a Feedback-Controlled Nanomechanical Resonator

sensing momentum kicks resonator

BMJ Neurology Open

Shared genetic aetiology of Alzheimer’s disease and age-related macular degeneration by APOC1 and APOE genes

gene diagnostic based analysis amd ad apoe apoc1 identified genes pleiotropy

medrxiv

The influence of home versus clinic anal human papillomavirus sampling on high-resolution anoscopy uptake in the Prevent Anal Cancer Self-Swab Study

cancer sampling =0 0 race lower hra participants anal home clinic hiv attendance arm versus