Breadcrumb navigation

Behind the Scenes Look at NEC's Development of its Own Large Language Model (LLM)

Featured Technologies

September 1, 2023

NEC's Large Language Model (LLM) is lightweight, high performance, and has a small number of parameters. In our previous interview, we spoke with the research team leader about the concept and features of the model. This time, we assembled the team members who were involved in the development to ask them about the technology details and the inside story behind its development.

Data Science Research Laboratories
Director & Senior Principal Researcher
Masafumi Oyamada
Overall supervision, design, API implementation, and UI, etc. as the leader of this project

Data Science Research Laboratories
Special Researcher
Kunihiro Takeoka
Overall support from preparing the training database (corpus) to its evaluation

Data Science Research Laboratories
Senior Researcher
Kosuke Akimoto
Model design and training using the AI supercomputer

Data Science Research Laboratories
Special Researcher
Yuyang Dong
Collection and preprocessing of the corpus and other preparation tasks

Data Science Research Laboratories
Taro Yano
Model evaluation and data collection to improve the model performance

Data Science Research Laboratories
Junta Makio
Model evaluation and training

A high performance LLM created over one turbulent month

― An official press release was issued after the previous interview. What was the reaction?

Oyamada: We received many responses from customers, and I heard that many people were talking about our LLM at the recently held ACL Meeting, which is the world's largest international conference for natural language processing. Thankfully, we have received many responses from customers, academia, and engineers.

― With all of the team members joining us for today's discussion, how long was the model under development?

Oyamada: The development started in April of this year.

― April!?

Akimoto: Yes, that is true. The project was launched before Golden Week (late April to early May) and advanced at a rapid pace.

Oyamada: In February of this year, Meta released an LLM called "LLaMA" (Large Language Model Meta AI), and I played around with that model semi-privately around March. However, when I tried to use it in Japanese, as you would expect, I just could not get it to perform well. The accuracy was not great, and even when it spoke in the Japanese language, the thought process was not very Japanese. Subsequently, I read the LLaMA research paper with great interest and started to think that maybe we could do something similar. Originally, both I and Mr. Yano here were involved in the research and development of a technology called NEC Data Enrichment for three years, and this technology is a cluster of language models. I thought that since we also possessed such know-how, we too could create an LLM. When I actually suggested the idea to the team members and realized that it might work when we tried it, we received permission to use the AI supercomputer from top management and began full-scale development. That was in April. Since we only had exclusive use of the AI supercomputer for one month, all of the team members worked together to quickly produce a result.

Akimoto: The actual LLM training takes time and there was also the Golden Week vacation, so we only had two weeks for preparation. During that time, we read related research papers and prepared the data. However, Mr. Oyamada did not compromise one bit even under such a time restriction. When even the slightest ambiguities were discovered in the model or training design, he would urge us over and over again, "We still have a bit more time, so let's run another comparative experiment and find the optimal design." But we don't have time!! From the initial start of the project, I was rushed to start the training as soon as possible, so I kept grumbling in my mind that "There is a trade off between the delivery date and quality!" (laughs) However, thanks to his prodding, I think that we were able to create good LLM.

Takeoka: It probably also helped that we had the know-how and data from when we used the AI supercomputer, which was in partial operation at the time, for the "AIO: Japanese Quiz AI Competition." We were able to smoothly carry out the development. In addition, NEC's AI supercomputer is amazing after all. Naturally, it has 928 high performance A100 80 GB GPUs on board, and it is equipped with high-speed storage. When developing an LLM, the process involving loading, processing, and saving the data is repeated many times, so it can take many times longer if the reading and writing speeds are slow. NEC's AI supercomputer was able to read and write the data in a speedy fashion, which enabled us to complete the development in a short period of time.

Daring to adopt an architecture that they were warned would "not perform well"

― How were you able to realize high accuracy with a small number of parameters?

Oyamada: One factor is the architecture that was adopted for our LLM.

Akimoto: To tell you the truth, the architecture that we adopted this time around was previously described in academia as having poor performance. However, when we read every research paper that we could find, we actually felt that this architecture in itself was amazing and that with the right training it could produce a high level of performance. We then began a process of trial and error. As a result, we were able to successfully extract a level of performance which was higher than expected, and the method adopted by this architecture is currently drawing attention once again on a global scale, so I think that is due to our foresight.

Dong: The data preparation and preprocessing was also a significant problem. While a massive amount of data to use in training can be retrieved from the Internet, naturally it also includes some noise. Examples of such noise include affiliate sites and adult content, etc. The question of how to train the LLM while effectively excluding the kind of content that we did not the LLM to see or have it generate was a major problem. Because we could not effectively handle the problem simply by using typical filtering methods, we developed dedicated classifiers which take the features of the Japanese language into consideration.

Makio: In some cases, the performance dropped as a result of simply filtering and retaining only the clean data.

Takeoka: Yes, that did happen. Regrettably, the definition of "good" text for LLM training is still not well understood worldwide. It is not as simple as just having the LLM read well-organized text.

Akimoto: It is also not the case that more data is better. In general, it is common to focus on the volume of data when trying to increase the performance of LLM. However, in our comparative experiment that was recently carried out, we were able to verify that even if, for example, you increase the volume, the performance may instead decrease when a lot of noisy data is included.

Dong: Therefore, we repeatedly performed comparative experiments while preparing data that underwent various versions of preprocessing. As a result, I think that we were able to accumulate a considerable amount of know-how regarding the degree of data preprocessing and the composition of the data content. The result of this process was the new LLM which successfully produced the highest level of performance.

― How do you evaluate good and bad levels of performance?

Yano: LLM performance is mainly evaluated in two stages which consist of an evaluation of the performance on natural language processing tasks such as document classification and question answering and an evaluation of the extent to which the LLM's response to queries are "natural," "useful," and "lack toxicity" from a human perspective. For example, there are metrics for quantitatively measuring the accuracy of document classification and other tasks, but that performance evaluation on its own is insufficient. Indeed, it's impossible to determine whether an LLM is genuinely practical without evaluating it from various perspectives using human judgement.

Makio: For the first phase, we conducted a quantitative evaluation, and as the second phase we incorporated human evaluation. We made preparations so as to be able to immediately insert the model into that process once it was ready and carried out the training in a smooth manner by coming up with ways to run the performance evaluation cycle as efficiently as possible.

Yano: In order to fully utilize the supercomputer, we ran the training at a considerable speed, so efficient evaluation was even more essential. In particular, the human evaluation of phase two involves human effort. Various ways of replacing this evaluation with automatic evaluation methods are being discussed at the research paper level and there is room for further research, but I think that we were able to gain insight into efficient evaluation methods through this LLM development.

There is still a lot of room for further evolution

― Going forward, what would you like to achieve with this LLM?

Yano: Naturally, we would like to more closely study the mechanism of how the LLM acquires or loses capabilities and knowledge depending on what kind of data is used and how the LLM is trained. In fact, the question of how an LLM actually learns something has still not been solved. If we can focus on the black box of this learning process and deeply understand it, then we should be able to make a significant contribution to improving the LLM performance.

Makio: I would like to use an LLM to resolve all the daily hassles I encounter. However, most of the current high performance LLMs are based on the English language, and if you ask them about recipes for traditional Japanese cuisine, they will produce inappropriate answers such as "add celery." My greatest concern is creating an LLM that can skillfully handle the kind of information which is needed in Japan and streamlining the work.

Dong: I also want to automate the tedious work. My specialty is databases, so I want to make it possible to entrust the tedious data processing work to the LLM. Of course, that may be difficult with just the LLM alone, so I am now wondering whether it is possible to enable the LLM to use existing tools like humans do.

Takeoka: I am wondering if it is possible to make the LLM support us when we read specialized texts. We are surrounded by administrative documents, contracts, and other texts that are difficult to understand. It would be great if we could create an interactive mechanism that could read such texts and say, "you should carefully read just this section" or answer a question with "read this part of the document." It might be interesting if the system could tell overseas visitors who are not proficient in Japanese what is written in a document in response to their question.

Akimoto: This might sound a bit fictional, but I think that we might be able to use LLM to create an experience that allows us to be reborn in another world. When trying to create some fictional world on your own, typically a person would have to create all of the detailed settings and characters, etc. However, it wouldn't be any fun if you were reborn in a world you'd configured yourself because you'd already know every inch of it, right? If you used an LLM which possessed a massive body of knowledge, then it should be able to fill in the parts that we intentionally left without creating in detail and output a world that was something unexpected. It’s like a simulation game or a tabletop RPG, popping up with our own settings. We could then dive into that world and interact with the characters created by the LLM.

Oyamada: That would be fun. To be honest, I was also thinking that something like that might be possible. Because the data that the LLM trains on can also be limited by time. Currently, our objective is to use the LLM for business purposes, so naturally it was trained on recent Internet data, but if, for example, it was trained only on documents from 2,000 years ago, you might be able to interactively converse with people from that time period. The idea of chatting with people from the past who project the values of that era might not be a dream.

― Thank you very much. Finally, please tell me about your goals as a team.

Oyamada: Regarding the LLM which was recently announced, we are now further accelerating the reasoning speed in practical application and increasing the quality of responses including the accuracy to make it more complete. All of the team members will continue to do their best to make this a reality. Moreover, in parallel with the recently announced LLM, we are also developing other LLMs with different variations. We are further accelerating our research and development so that we can make an announcement to everyone in the near future, so please stay tuned.

  • The information posted on this page is the information at the time of publication.