What local, small models are you all using?

devxyn@sh.itjust.works · 6 days ago

What local, small models are you all using?

𞋴𝛂𝛋𝛆@lemmy.world · edit-2 6 days ago

Qwen 2.5 VL and Code. I have a VL doing image captions for LoRA training running now. A 14B is okay for basic code. A quantized 32B 6KL gguf of the same Qwen 2.5 code model runs on 16GB but at a third of the speed of the 14B in bits and bytes 4b. The latter is reasonably fast enough for a couple layers of agentic stuff in emacs with gptel and hits thinking or function calling out of a llama.cpp server better than 50% of the time.

I still haven’t tried the new 20B out of Open AI yet.