R1dacted: Investigating Local Censorship in DeepSeek’s R1 Language Model
Quoting from the abstract:
While existing LLMs often implement safeguards to avoid generating harmful or offensive outputs, R1 represents a notable shift—exhibiting censorship-like behavior on politically charged queries. […]
Our findings reveal possible additional censorship integration
likely shaped by design choices during training or alignment,
raising concerns about transparency, bias, and governance in
language model deployment.
https://arxiv.org/pdf/2505.12625
Quoting from the abstract: