Breadcrumb navigation

AI-based Self-Education to Improve Work Quality at Manufacturing Sites
Work Education Support Technology

Featured Technologies

January 29, 2025

At manufacturing sites where production takes place, companies make constant efforts to enhance workers' skills, such as by having instructors provide intensive one-on-one education. In recent years, however, the increasing diversity and mobility of the workforce has led to rising costs of education and an inability to provide sufficient education, raising concerns about the decline of work quality. To address such issues, NEC has developed technology that uses AI to analyze videos and automatically offer improvement advice so that workers can adjust their movements to more closely match those of a model action. This is the world's first (*1) technology to realize AI-based self-education that enables workers to master new tasks on their own without an instructor, in various industrial settings such as manufacturing, warehousing, construction, and retail. We spoke with the researchers about the details of this technology.

  • *1:
    As of January 2025 based on research by NEC

AI-based self-education that enables users to master tasks without an instructor

Visual Intelligence Research Laboratories
Principal Researcher
Tetsuo Inoshita

―What kind of technology is this work education support technology developed by NEC?

Inoshita: The technology uses video analysis technology to support work education at manufacturing and logistics sites. Videos of workers performing tasks are provided to an AI, which compares them to model action videos and then gives text or video feedback on points for improvement. In recent years, the number of manufacturing processes that workers in the manufacturing industry must learn continues to increase, due to the growing complexity of tasks and the shift toward high-mix, low-volume production. In addition, the aging population of skilled workers is a major issue. This is also making it more difficult for manufacturing companies to secure enough human resources and to pass down tacit knowledge and practical know-how that is not easily verbalized. In addition, the workforce itself is becoming more diverse and extremely mobile. Whenever a new employee joins a company, an instructor, who also has other tasks, must spend several days providing intensive one-on-one training, which drives up the education costs. These are the pressing issues that this technology is focused on addressing. We estimate that by reducing the amount of time that instructors spend on education, this technology can reduce the education costs to about one-tenth the current level. By streamlining the education process and having fully proficient employees carry out production, we can also expect improvements in product quality.

Moriwaki: While there are existing technologies that, for example, automatically create work manuals or use AR glasses for training support, I believe this is the first time we have seen a technology capable of offering direct feedback to workers about the actual movements they perform.

Visual Intelligence Research Laboratories
Researcher
Kosuke Moriwaki

Inoshita: In 2022, NEC developed a technology that analyzes the pose of the hands and images of the surrounding area to identify very detailed manual tasks. At that time, the technology was mainly focused on "visualization," such as identifying the order of work processes and the time spent on the work. This new technology has advanced into the realm of "analysis," which is capable of identifying subtle differences between a worker's hand movements and those of a model action, and generating appropriate advice on how to make the worker's movements more closely match those of the model action.

Example of automatically generated advice

Detecting subtle differences in work movements and posture, and using a VLM to generate improvement advice focused on those differences

Visual Intelligence Research Laboratories
Researcher
Sachio Iwasaki

―What types of technologies were used to realize this new technology?

Inoshita: Along with the technology for identifying work processes using hand pose and images of the surroundings, which I just mentioned, we are also utilizing two other technologies. One is a technology for capturing the subtle differences between the worker's movements and the model actions, and the other is a technology for generating appropriate advice to help raise the worker's skill.

Iwasaki: Let me explain the technology we use to capture the differences in the work. We used technology from the field known as "video alignment." This technology automatically detects differences between the frames in a video. By updating this technology, we can now detect even minute differences in manual tasks. Specifically, our system is designed to calculate the similarity by looking in detail not only at the skeletal structure of the fingers and the position of the hand, but also at the relationship between the hand and the object, such as how the hand interacts with the object, whether it is pressing or pinching the object, and so on. As a result, we are now able to compare the similarity of videos and detect which frames are similar and which are not, with high accuracy, even for videos of differing lengths. We are currently working with BU members to prepare a research paper on this technology, with the aim of submitting it to an international conference.

Moriwaki: Also, to generate appropriate advice, we are utilizing a type of generative AI called a VLM (Visual and Language Model). In our system, video footage of the differences identified by the video alignment technology is input into the VLM. Then, we provide a prompt (instructions to the generative AI) such as, "Based on the differences between these two videos, what should be done to make the worker's movements more closely match the model action?" From that, we obtain an answer. In our system, these prompts are automatically generated in a standardized format, so customers or workers do not need to manually enter the prompts each time.

In fact, these prompts are extremely complex and important, and a lot of trial and error was required to develop the final version. This is because simply throwing the video footage of the differences and a prompt at the VLM does not guarantee good results. For example, at an actual work site, it may not be effective to display lengthy advice. Fortunately, there are some NEC affiliated companies with manufacturing plants. While we were developing the technology, I personally attended training at this plant and gathered candid opinions from the workers. That helped us create prompts that can output the most effective advice for acquiring skills.

Inoshita: The key point is that we are using hand skeletal information and controlling the VLM so that it provides appropriate advice.

Iwasaki: That's right. On the video alignment side as well, we focused on feeding only the video frames with actual differences into the VLM, as that approach resulted in higher accuracy.

Aiming to develop technology that can analyze movements of the whole body

―Tell us about the future prospects of this technology.

Inoshita: We have only just created this technology, so our plan is to carry out further development to make it easier to implement at work sites. We are already receiving inquiries from customers, mainly in the manufacturing industry, and would like to keep working to prepare for future demonstrations.

Moriwaki: Right. As Inoshita mentioned earlier, we estimated that the education cost could be reduced to one-tenth. At the demonstration sites, we would like to confirm whether we can really achieve this, and whether it is possible to reduce the education cost even further. Another goal is to expand the fields of application. Beyond manufacturing, there should be a variety of other applications, such as in logistic and retail fields.

Iwasaki: Personally, I would like to find out whether the technology can be applied in the medical field. For example, If it is applied to the training of surgeons, it may prove to be very effective.

Inoshita: We believe there is a strong need for this technology overseas as well. Factories located overseas often employ a highly mobile workforce of 1,000 or 2,000 workers. Large-scale sites such as these may be more likely to see even greater benefits.

In addition, although we are currently developing the technology with a primary focus on manual tasks, in the future we would like to apply it to tasks that involve the use of the whole body. For instance, in the case of automobile manufacturing, once the work of assembly by hand is complete, it is necessary to assemble larger components. In this scenario, wearable cameras would be required in addition to fixed cameras. So, we expect the technology to advance along two axes, in terms of both the evolution from fixed cameras to wearable cameras, and the evolution from manual tasks to whole-body tasks.

NEC is a company that possesses a number of world-leading technologies in biometric authentication and other analysis with a particular focus on people. NEC possesses highly advanced technologies, such as 3D skeleton recognition technology and technology for analyzing the relationships between the movement of people and the surrounding objects or environment. Going forward, we would like to work with these capabilities to further develop our technology.

In recent years, manufacturing sites have faced a variety of issues such as the difficulty of passing down skills and the rising costs of education provided by instructors. In response, NEC has developed a technology that uses AI-based video analysis to automatically generate advice based on the actual movements of workers. By successfully controlling a VLM used together with video analysis technology developed by NEC, it is now possible to generate effective advice that can actually be applied at the site, rather than simple pattern-based feedback. Using this technology for self-education, workers can master tasks without supervision at work sites in manufacturing, logistics, construction, and various other industries.

  • The information posted on this page is the information at the time of publication.