LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.
This is 100% true. LLMs, neural networks, markov chains, gradient descent, etc. etc. on down the line is nothing particularly new. They’ve collectively been studied academically for 30+ years.
Well LLMs and particularly GPT and its competitors rely on Transformers, which is a relatively recent theoretical development in the machine learning field. Of course it's based in prior research, and maybe there even is prior art buried in some obscure paper or 404 link, but if that's your measure then there is no "novel theoretical approach" for anything, ever.
I mean I'll grant that the available input data and compute for machine learning has increased exponentially, and that's certainly an obvious factor in the improved output quality. But that's not all there is to the current "AI" summer, general scientific progress played a non-minor part as well.
In summary, I disagree on data/compute scale being the deciding factor here, it's deep learning architecture IMHO. The former didn't change that much over the last half decade, the latter did.