So you don't have to click the link, here's the full text including links:
Some of my favourite @huggingface models I've quantized in the last week (as always, original models are linked in my repo so you can check out any recent changes or documentation!):
and that's just the highlights from this past week! If you'd like to see your model quantized and I haven't noticed it somehow, feel free to reach out :)
Do you do any kind of before/after testing of these to measure performance/accuracy changes? I've always wondered if there is some way to generalize the expected performance changes at different quantizations.
You can get the resulting PPL but that's only gonna get you a sanity check at best, an ideal world would have something like lmsys' chat arena and could compare unquantized vs quantized but that doesn't yet exist