Why I wouldn't rule out Large Language Models taking us to AGI
Why I wouldn't rule out Large Language Models taking us to AGI:
The argument against LLMs achieving AGI is often based on contrasting them with human intelligence:
A: Human intelligence develops from small amounts of data, in real-time, on 20W of power, using metacognition. By contrast, LLMs work with massive amounts of data, are pre-trained, use massive power, and operate without any cognitive awareness. Therefore, AGI requires a different paradigm.
B: LLMs have already exhausted most available data sources, and without much larger datasets, they can't get much smarter.
However, this analysis is mostly flawed:
Human intelligence did develop using massive amounts of data and power:
Our cognitive architecture developed over hundreds of millions of years and is encoded in our DNA.
Our senses process massive quantities of data while growing up. (We’re barely aware of this, since the process is almost entirely subconscious.)
LLMs training builds cognitive architecture from scratch, effectively re-encapsulating both the evolutionary process and real-time learning. While very different, it's not necessarily insufficient for intelligence.
The essential similarity between human brains and LLMs is that they are essentially compression algorithms: compressing massive amounts of world data into worldviews that provide predictive models to guide action. The main difference is that human brain's architecture and learning processes are highly optimized and efficient, allowing it to learn from relatively small amounts of data in real-time. LLMs, on the other hand, require vast amounts of data and computational power to achieve comparable performance.
We don't know how human intelligence works on an architectural basis, so LLMs are a "brute force" approach to bridge that ignorance. It's likely that human-level intelligence can run on much more modest hardware -- if we only knew the correct architecture. The efficiency of LLM training and operation is improving rapidly however, as the close in on the optimal architecture: Advancements in LLM architectures, such as the development of more efficient attention mechanisms and the incorporation of sparse representations, can help reduce the computational and data requirements for training and inference.
It's a misconception that progress in the LLM era will be one of using ever-larger datasets. On the contrary, progress is being made using ever-smaller datasets and learning how to train LLMs on synthetic (self-generated) data with positive-feedback cycles. The LLM training process generates its own intermediate training data, which is used to create the next generation of training data. The human-source data is thus only needed to jumpstart the process.
Back to the definition of LLMs: Broadly defined, LLMs are algorithms that are based on large datasets, unsupervised learning, generalization of skills that are not explicitly trained, and wide applicability to downstream tasks. This is very much like human intelligence -- except that our training encapsulates both the evolutionary process and real-time learning.