Not what we expected…

  • 0x01@lemmy.ml
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 day ago

    DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:

    • Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.
    • Smarter tool calling: Through post-training optimization, the model’s performance in tool usage and agent tasks has significantly improved.
    • Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

    The tool calling improvements are very welcome

    • pepperfree@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I wonder if we can extend the context length. It already fine-tuned with YaRN so we can’t get free extend with that method.