10mo ago

Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy.

arxiv.org /abs/2411.06535

25 comments

Great, so it's still wrong 1 out of 20 times, and just got even more energy intensive to run.
- Genuine question: how energy intensive is it to run a model compared to training it? I always thought once a model is trained it's (comparatively) trivial to query?
  
  A 100-word email generated by an AI chatbot using GPT-4 requires 0.14 kilowatt-hours (kWh) of electricity, equal to powering 14 LED light bulbs for 1 hour.
  Source: https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers/
  
  For the small ones, with GPUs a couple hundred watts when generating. For the large ones, somewhere between 10 to 100 times that.
  With specialty hardware maybe 10x less.
  
  Still requires thirsty datacenters that use megawatts of power to keep them online and fast for thousands of concurrent users
- I wonder how that compares to the average human?
  
  I would not accept a calculator being wrong even 1% of the time.
  AI should be held to a higher standard than "it's on average correct more often than a human".
  
  Not a very good, or easy comparison to make. Against the average, sure, the AI is above the average. But a domain expert like a doctor or an accountant is way much more accurate than that. In the 99+% range. Sure, everyone makes mistakes. But when we are good at something, we are really good.
  Anyways this is just a ridiculous amount of effort and energy wasted just to reduce hallucinations to 4.4%.
Congratulations to AI researchers on discovering the benefits of peer review?
LLM’s will never achieve much higher than that simply because there’s no reasoning behind it. It. Won’t. Work. Ever.
I still see even the more advanced AIs make simple errors on facts all the time....
Sounds like Legion from Mass Effect
- Acknowledged, we have reached consensus.

25 comments