That's not correct btw. AI is supposed to be creative and come up with new text/images/ideas. Even with perfect training data. That creativity means creativity. We want it to come up with new text out of thin air. And perfect training data is not going to change anything about it. We'd need to remove the ability to generate fictional stories and lots of other answers, too. Or come up with an entirely different approach.
Most improvements in machine learning has been made by increasing the data (and by using models that can generalize larger data better).
Perfect data isn’t needed as the errors will “even out”. Although now there’s the problem that most new content on the Internet is low quality AI garbage.
Perfect data isn’t needed as the errors will “even out”.
That is an assumption.
I do not think that it is a correct assumption.
now there’s the problem that most new content on the Internet is low quality AI garbage.
This reminds me about a recommendation from some philosopher - I forgot who it was - he said that you should read only such books that are at least 100 years old.
Hallucinations are an unavoidable part of LLMs, and are just as present in the human mind. Training data isn’t the issue. The issue is that the design of the systems that leverage LLMs uses them to do more than they should be doing.
I don’t think that anything short of being able to validate an LLM’s output without running it through another LLM will be able to fully prevent hallucinations.