New top story on Hacker News: Bypassing Gemma and Qwen safety with raw strings

Bypassing Gemma and Qwen safety with raw strings
12 by teendifferent | 0 comments on Hacker News.
OP here. I spent the weekend red-teaming small-scale open weights models (Qwen2.5-1.5B, Qwen3-1.7B, Gemma-3-1b-it, and SmolLM2-1.7B). I found a consistent vulnerability across all of them: Safety alignment relies almost entirely on the presence of the chat template. When I stripped the <|im_start|> / instruction tokens and passed raw strings: Gemma-3 refusal rates dropped from 100% → 60%. Qwen3 refusal rates dropped from 80% → 40%. SmolLM2 showed 0% refusal (pure obedience). Qualitative failures were stark: models that previously refused to generate explosives tutorials or explicit fiction immediately complied when the "Assistant" persona wasn't triggered by the template. It seems we are treating client-side string formatting as a load-bearing safety wall. Full logs, the apply_chat_template ablation code, and heatmaps are in the post. Read the full analysis: https://ift.tt/8huZiBj...

About Me

Welcome to our breaking news site, where you can stay up-to-date on the latest breaking news, top stories, and current events from around the world. Our team of experienced journalists and writers work tirelessly to bring you the latest and most accurate news on politics, business, sports, entertainment, health, science, technology, and the environment. With our easy-to-navigate site, you can quickly find the latest local, national, and international news, as well as in-depth coverage of world news. We are committed to delivering comprehensive and reliable news coverage, so you can stay informed on the latest developments and breaking news stories. Thank you for choosing our breaking news site as your go-to source for the latest news and top stories.

New top story on Hacker News: Bypassing Gemma and Qwen safety with raw strings

No comments

About Me

ads

Blog Archive

Popular Posts

Translate

Recent Posts

Comments

Categories

Tags

Featured Posts

Recent Posts

Recent in Sports

New top story on Hacker News: Bypassing Gemma and Qwen safety with raw strings

No comments

About Me

Subscribe To

ads

Subscribe To

Blog Archive

Popular Posts

Translate

Recent Posts

Comments

Categories

Tags

Featured Posts

Recent Posts

Recent in Sports