NEC develops Large Language Model (LLM)Featured Technologies- Achieving world-class high-performance Japanese language processing -
July 6, 2023
Research and development of Large Language Model (LLM) is thriving. Looking around the world, we see various front-runners starting to develop new LLMs. Among such enterprises, NEC also succeeded in developing an LLM. But why did NEC embark on the development of an LLM? And what made this possible? We spoke with the researchers about the details.
LLM is the core of all generative AI
― First, tell us concisely about what an LLM is.
An LLM is an AI model built by training a massive amount of text data. Unlike any conventional language processing technology, an LLM can precisely understand text and answer questions. The LLM is based on the technology of deep learning, which has been studied for the last 15 years or so.
Various analyses are possible with deep learning by training AI with data. An example of this is image recognition technology, including NEC’s specialty, face recognition. However, deep learning up to 2018 had an issue: people had to annotate information to image data, such as labeling an image of a dog as “dog” and that of a bird as “bird,” before training the data for the AI. The dependence on human efforts has been inevitably the bottleneck for improving the accuracy of AI.
In response to this problem, a technology called “Self-Supervised Learning” emerged in 2018. With this technology, even without people adding information to data, the AI self-learns what the image represents and what topics are covered in the text. This enabled the AI to become wiser as long as you provide a server with excellent computing performance and high-volume data.
As a result, Google developed the AI model “Transformer,” which is the foundation of LLM. A year later, OpenAI developed the Generative Pretrained Transformer (GPT*) based on Transformer. These are the technologies that lead to today’s ChatGPT. Back then, the capabilities of GPT were limited to generating the remainder of a text, provided that you supplied the initial part of the writing. Now, after drastic evolution, it can respond to complex requests such as creating a summary of the writing or “interesting lyrics on the topic of NEC,” making a sensation, which you may already know.
- *GPT is a trademark of OpenAI.
― It made a headline along with the image generative AI.
Yes, but this meant that it can do “clever, interesting things,” which is not the only reason that the market is exhibiting vigor as much as we are seeing now. The key is that text strings are at the center of all protocols.
For example, a program source code is text string information. In fact, if you instruct it to write a program using Python, it will write a nearly perfect source code as instructed. In contrast, if you entered a source code and asked what it meant, it can answer what exactly that code is trying to do just like an engineer would do.
Image data can also be linked through text strings. While there are massive amounts of image data on the Internet, in most cases they are provided with related text strings nearby. Using this, it is possible to train the AI on what is pictured in the image. This means that you can link the text strings accumulated by the LLM with image recognition. It is a technology that fits in with the image generative AI.
As a more progressive approach, an LLM can be used as robot knowledge. Let’s take an example of making a robot bring water in a glass placed in front of it. Previously, people had to set up the order of motions—holding the glass, bringing the glass to a water dispenser, and pouring water into the glass—and the associated robot controls beforehand. Linked with an LLM, you can simply tell the robot that you want water, and it will answer the common order of motions for that specific situation and robot controls such as “holding the glass gently so as not to break it.” The enormous amount of useful information developed by humans are consolidated in an LLM, so using this can drastically improve the efficiency of robot control.
As we have reviewed here, LLMs are universal and innovative technology that is the centerpiece of various generative AIs and robot control.
High-performance, practical LLMs with smaller model size
― Why did NEC move into the development of LLMs?
In February 2023, Meta published the Large Language Model Meta AI (LLaMA), an open source LLM freely usable for academic purposes, which rejuvenated the developer community. LLM technology is explosively advancing around the world today. However, because LLaMA is an English-exclusive model, it is not good at processing Japanese or comprehending Japanese culture. The reality is that the size of LLMs that support the Japanese language is only a few tenths of the size of models in English-speaking countries. Considering the future and proliferation of AIs, it is extremely important for Japan to have the ability to press forward technologies on its own.
On another note, NEC possesses a supercomputer for AI research, which is quite uncommon in Japan. Unlike supercomputers that use CPUs to perform universal numerical computations and simulations, NEC’s supercomputer is the largest of its kind owned by a Japanese company* that is based on GPUs, which are capable of handling large-volume matrix operations, making it suitable for deep learning. NEC has spent several years infusing many resources into its preparation and started operation of LLMs this past March. One reason for NEC starting the development of LLMs is that by using this resource, we believe that we can create an LLM with good Japanese language proficiency.
― While global platform providers are embarking on the development of LLMs, what position are NEC’s LLMs going to aim for?
The accuracy of LLMs differ depending on the number of parameters and the amount of learning data.
The “number of parameters” is like the number of synapses in a human brain. In recent years, the world has been working on how to increase this to improve performance. In terms of GPT, the number of parameters has increased by 1500 times between the first and the most current model. However, as the number of parameters increase, inferences take too long and also require expensive GPUs. A tenfold increase in parameter size requires 10 times more GPUs, which simply bloats costs by 10 times. This also comes with variable costs such as huge electricity costs for its operation.
Therefore, NEC aimed to achieve both high inference ability and improved operability, not by inexhaustibly increasing the number of parameters, but through an approach that predominantly increases the amount of learning data compared to conventional models.
You may think that giant platform providers have an advantage in data volume, but that is not true. Open text data is available in abundance over the Web. For example, we can use data from Aozora Bunko and other sources. In this sense, we are on the same playing field in terms of the amount of data.
However, accuracy is not only about collecting data. Through repeated trial and error in pursuit of the optimal data type distribution, like blending whiskey, and acquiring high-quality text by drawing on the knowledge cultivating through AIO: Japanese Quiz AI Competition”, we succeeded in showing performance that was even surprising to us despite the LLM having a medium model size (largest as a Japanese language model*).
Increasing the data volume means that the necessary computing resource must also be increased to handle the learning process. The successful completion of AI training on an unprecedented scale in a realistic time span owes to the AI supercomputer mentioned earlier.
The recently developed LLM maintains a unique position as a highly-accurate LLM with a practical model size, not to mention its strong Japanese language proficiency.
We are currently reviewing various practical applications based on this LLM. We may offer this as an open source in part for the academia and/or software engineers. Another possible option is to tune it up to a system for use on-premise for customers with confidential information. The smaller model size makes it possible to keep the costs to a realistic level for dedicated servers. We will continue research with the aim to provide an NEC original “reliable AI.”
NEC’s AI supercomputer was essential to the development of an LLM. What is it exactly and what potential does it have? We interviewed the chief of development of the AI supercomputer to hear about the details.
Building your own generative AI
― Tell us about NEC’s AI supercomputer.
In March 2023, NEC built and started operation of the largest privately-owned supercomputer in Japan.* An AI supercomputer is a super-high-speed computing system that boosts the efficiency of AI training process. This enables a drastic reduction in the time spent on building an AI.
Some language models used as a base for typical LLMs can take up to 355 years to complete AI training using only one GPU. Today, AI supercomputers are essential for researching and developing large, cutting-edge AIs such as generative AIs.
NEC’s newly released LLM was trained using 512 GPUs equipped on NEC’s AI supercomputer. Developing and having an AI supercomputer means the same as having the ability to create advanced generative AIs in-house. This is very meaningful to researchers, partners, and our customers.
Creating the future with generative AIs
― Tell us about the policy for developing AI supercomputers in the future.
People can use not only language but also images and audio to model various real-life events for making decisions upon considering the best action and solution based on future prediction. This is a sophisticated action we do subconsciously daily, underlaid by decades of knowledge acquired through regular life.
In order to create an AI that supports sophisticated human decision-making, we need to train the AI with enormous amounts of knowledge similar to that used by humans. Nevertheless, since it takes a long time to train even a language-proficient AI, far more computational ability is required to build an AI that can also handle images and audio.
AIs are at the center of the digital revolution, and their large-scale computational ability can be the source of competitiveness. To this end, we intend to expand our computational ability further. We aim to significantly scale out the world-class abilities of NEC’s excellent AI researchers and continue developing advanced generative AIs as part of our efforts for new social value creation using AIs.
While there are a variety of LLMs, including GPT, NEC’s LLM has the following features: it supports the Japanese language and allows for practical application thanks to its smaller model size. Another key factor is that NEC has the largest supercomputer for AI research among Japanese companies.
While being a medium size (largest class in Japan*), it is equipped with learning data distribution and search functions to maximize learning efficiency to exert high performance. The smaller model size also possibly allows for its use on-premise in an environment that handles confidential information.
- *As of end of June 2023 based on research by NEC
- *The information published on this page is as of the date of publication.