DeepSeek dropped the V3.1 Weight

pepperfree@sh.itjust.works · 11 months ago

DeepSeek dropped the V3.1 Weight

0x01@lemmy.ml · 11 months ago

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:

Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.

Smarter tool calling: Through post-training optimization, the model’s performance in tool usage and agent tasks has significantly improved.

Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

The tool calling improvements are very welcome

pepperfree@sh.itjust.works · 11 months ago

I wonder if we can extend the context length. It already fine-tuned with YaRN so we can’t get free extend with that method.

DeepSeek dropped the V3.1 Weight

DeepSeek dropped the V3.1 Weight

deepseek-ai/DeepSeek-V3.1 · Hugging Face