R1dacted: Investigating Local Censorship in DeepSeek’s R1 Language Model
Quoting from the abstract:
While existing LLMs often implement safeguards to avoid generating harmful or offensive outputs, R1 represents a notable shift—exhibiting censorship-like behavior on politically charged queries. […]
Our findings reveal possible additional censorship integration
likely shaped by design choices during training or alignment,
raising concerns about transparency, bias, and governance in
language model deployment.
Quoting from the abstract: