It’s hard to escape ChatGPT’s prevalence in the AI computing world. But while much has been written about its capabilities and future promise, there has been much less insight on its technical makeup and, specifically, how its expected growth requirements can be best met.
At its core, ChatGPT – together with its latest successor GPT-4 and lots of other models – belong to the so-called large language model (LLM) using the increasingly prevalent ‘transformer’ architecture. The power of transformers rests in their use of ‘attention’, which allows the model to link each word to different words and understand how they relate to each other in their context. As we include a larger number of such ‘attention’ layers in the model and train them properly, the model can start to ‘understand’ the conversation.
An exciting trend has been observed in recent years: as LLMs grow larger and are trained with bigger and better datasets, their performance improves steadily, with the latest success of ChatGPT being an excellent example. ChatGPT is, of course, not the end of the story. The sector has every reason to expect future intelligence to emerge from even larger models.
Over the past few years, the industry has seen that LLM size is increasing at a rate of 240x every two years, with ChatGPT reaching 175 billion parameters, and Google’s latest model PALM-E reaching 540 billion parameters. These models grow far faster than what digital electronics can keep up with. The current rendition of ChatGPT needs to be trained with more than 150 GPU years, or equivalently 54,000 GPUs in a day. Some estimate that training an LLM like GPT-3 could cost over $4 million. In fact, Microsoft has invested hundreds of millions to build the supercomputer for ChatGPT. What if we need 10 times more, or 100 times more computing power?
This leads us to ask, what are the current hardware bottlenecks, and what technology can fully enable this exponentially increasing need for compute power?
Lumai, a company that emerged from The University of Oxford in January 2022, is at the forefront of developing next-generation AI through all-optical neural networks. Securing £1.1 million in Smart funding recently, Lumai aims to create advanced optical computing processors that break free from the limitations of traditional electronic processing.
As transistor-based digital electronics struggle to meet the increasing computational demands of AI, Lumai’s computing platform offers energy-efficient and incredibly fast parallel processing. By utilising optical neural networks, Lumai’s technology can be up to 1000 times faster and more sustainable than existing transistor-based systems. The company originated from Professor A.I. Lvovsky’s esteemed experimental optics research group, envisioning the development of end-to-end all-optical neural networks with minimal reliance on electronic processing. And today on TFN, Xianxin Guo, Co-founder and Head of Research at Lumai, provides insights into whether the UK possesses the necessary technology to keep pace with the evolving landscape of generative AI. Here’s what he has to say,
There are three major bottlenecks facing current computing hardware: computing speed, power consumption and memory bandwidth.
Regarding computing speed, for half a century the world has been able to produce more and more powerful computers following Moore’s law, which predicted double the transistor density on integrated circuits every 18 months. However, this empirical law has started to decline as transistors approach their physical limit and problems such as current leakage and quantum tunnelling arise. In the past fifteen years, parallel-computing processors such as GPU, TPU and ASIC have been developed to provide the massive computing power required by AI. However, these digital electronic processors will still suffer from the decline of Moore’s law.
Power consumption is a more severe limitation of digital electronics. As of today, more than 2% of the world’s electricity energy is used by data centres, and this number keeps growing. Advanced processors typically consume about 1 pJ per operation, at least 100 times less efficient than a human brain. This is because even a single arithmetic calculation involves a large number of transistor operations. What’s worse, external DRAM memory consumes about 100 pJ per 32-bit data access, 100 times more than the operation itself. Therefore, huge efforts have been taken to design computing architectures for minimising DRAM access as much as possible.
A third major limitation of electronics is the bandwidth. Most modern computing processors adopt the Von Neumann architecture where data is stored in the memory and computation takes place in a computing core, therefore data needs to be moved back and forth in between. This worked well in the past, yet as the computing speed increases substantially, the data movement has led to a significant bandwidth bottleneck which means that we need to spend more time waiting for the data to be fetched rather than performing the computation. As a comparison, DRAM bandwidth speed has increased at a far lower rate than hardware computing speed, according to a recent survey. Complicated memory hierarchy and computing pipelines have been developed to deal with this constraint.
Overall, these hardware bottlenecks in computing speed, power consumption and memory are limiting the progress of AI and especially LLMs. We are in dire need of a new computing paradigm to support the next generation of AI.
How can optics help?
Optics are already used in many ways to transform our lives. For example, a massive amount of internet data is now optically communicated all over the world using optical fibre networks. What if we now take another step, and build an optical computer? Optics has every characteristic to totally revolutionise the power of AI.
Almost all AI models – including ChatGPT – involve a large number of matrix multiplications to support different functional blocks including the above-mentioned ‘attention’ layer. Matrix multiplication constitutes the bulk of the computation in AI. Optics aim to encode the matrix onto the light using optical modulator arrays, and perform optical matrix multiplication using natural properties of light such as interference and diffraction. Once the input information is encoded, computation is performed as light passes through the system at the speed of light. Such optical matrix multipliers are special-purpose processors with high computing speed, high energy efficiency and low latency.
The high computing speed of optics offers benefits from two distinct aspects: a much faster clock, and the potential for almost infinite amounts of parallelism. While digital processors typically run at a clock speed of 1 GHz, optics can run at up to 100 GHz, as already demonstrated in optical communication. Furthermore, in optics we can encode information in multiple wavelengths, polarisation and spatial modes, hence performing massively parallel computations. An optical matrix-matrix multiplier operating at 100 GHz with 100 wavelengths and a spatial matrix core of 1000×1000 can perform 2^19 operations per second, 10,000 times faster than state-of-the-art digital processors. An optical processor as fast as the best digital counterparts is already a possibility with today’s technology.
Another major advantage of optical computing is the high energy efficiency. As already demonstrated by academic research groups, optical computing can be performed with less than one photon per operation, therefore the optical energy is almost negligible. However, there is still power consumption from the opto-electronic interconversion, electronic peripherals and memory access. The power consumption of the system increases linearly in proportion to the size of the matrix core, while computing speed grows at a much faster rate – quadratically. Consequently, as technology advances and the size of the matrix increases, the energy efficiency of the optical system actually increases.
Furthermore, we can also use neuromorphic photonic processors to perform in-memory computing, thus circumventing the memory bottleneck. A recent analysis from Cornell University estimates that optical matrix multipliers can be over 8,000 times more efficient when used in LLMs. While 8,000 times improvement is ambitious and very challenging, we envisage that an optical processor with 100 times higher energy efficiency can be built in a few years using currently available low power components.
Overall, it’s clear that optics are a high speed and energy efficient solution to unlocking the boundless potential of AI. With the technology continually being developed to keep up with this evolution, there’s now no stopping the sector when it comes to driving the future capabilities of optical-based AI systems.