What local, small models are you all using?
What local, small models are you all using?
I'm curious about what the consensus is here for which models are used for general purpose stuff (coding assist, general experimentation, etc)
What do you consider the "best" model under ~30B parameters?
Qwen3-30B-A3B-2507 family is an absolute beast. The reasoning models are seriously chatty in their chain of thought, but the results speak for themselves. I’m running a Q4 on a 5090, and with a Q8 KV quant, I can run 60k token context entirely in vram, which gets me up to 200 tokens per second.