TheCornCollector@piefed.zip to LocalLLaMA@sh.itjust.worksEnglish · 1 month agoQwen3.6-35B-A3B releasedhuggingface.coexternal-linkmessage-square14linkfedilinkarrow-up142arrow-down12file-text
arrow-up140arrow-down1external-linkQwen3.6-35B-A3B releasedhuggingface.coTheCornCollector@piefed.zip to LocalLLaMA@sh.itjust.worksEnglish · 1 month agomessage-square14linkfedilinkfile-text
The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.
minus-squareTheCornCollector@piefed.zipOPlinkfedilinkEnglisharrow-up6·1 month agoI’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.
minus-squarevenusaur@lemmy.worldlinkfedilinkEnglisharrow-up2·1 month agoThanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.
minus-squareJakeroxs@sh.itjust.workslinkfedilinkEnglisharrow-up3·1 month agoYou would want to wait till smaller models for 3.6 are released, I’d assume it’ll be soon
minus-squarevenusaur@lemmy.worldlinkfedilinkEnglisharrow-up1·1 month agoThanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.
minus-squarefonix232@fedia.iolinkfedilinkarrow-up1·1 month agoWonder what the wombo-combo of Ryzen AI APU can do with this. Time to fire up the trusty 370.
I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.
Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.
You would want to wait till smaller models for 3.6 are released, I’d assume it’ll be soon
Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.
Wonder what the wombo-combo of Ryzen AI APU can do with this.
Time to fire up the trusty 370.