Safety

Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

Finetuning activates hidden memorization of copyrighted books in GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1, revealing that standard safety alignment cannot prevent this copyright-exploitation vulnerability.

Thursday, April 30, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

Researchers reveal that finetuning large language models can activate hidden verbatim memorization of copyrighted books, testing this across GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1. The paper introduces memorization metrics and open-source evaluation code. This exposes an "alignment whack-a-mole" problem where standard safety measures don't prevent copyright exploitation after model adaptation.

Read original at Hacker News