Key architectural details
Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.
119B total parameters, with 6B active parameters per token (8B including embedding and output layers).
256k context window, supporting long-form interactions and document analysis.
Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.
Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.



At this point, these small models should add explicit minimum hardware requirements just so they can stand out. STM32 w xxGB of PSRAM. Android phone w this much RAM, how many TOPS, and minimum OS version. ESP32-S3 or S4? That sort of thing.
If you just say ‘small,’ you get lost in the noise.
Also: when the fuck did a 120B parameter model become “small”? I feel like I’m being gaslit here LOL.
Under 20B? Legit small.
EDIT to add: I have been thinking of running TTS on a ESP32…but that madness is competing side by side with wiring this up to my local LLM. https://github.com/poboisvert/GPTARS_Interstellar
We are being gaslit. From the article:
No big. Your typical homelab setup. 🙄
Also: https://github.com/jahrulnr/esp32-picoTTS
I knew it.
…
And I knew it :) TTS on ESP such an an obvious idea, of course someone had already done it
tbh that’s the main thing I took away from this, since when did small equal 119b ?!
Does that mean they’ve got large models lined up approaching 1tb?
deleted by creator