I’m curious about what the consensus is here for which models are used for general purpose stuff (coding assist, general experimentation, etc)

What do you consider the “best” model under ~30B parameters?

  • staph@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Qwen3-30B-A3B-2507 family is an absolute beast. The reasoning models are seriously chatty in their chain of thought, but the results speak for themselves. I’m running a Q4 on a 5090, and with a Q8 KV quant, I can run 60k token context entirely in vram, which gets me up to 200 tokens per second.