Breadcrumb navigation

Aiming to realize human digital twins:
Image analysis technology that efficiently senses the real world

Featured Technologies

August 31, 2023

Today, a diverse range of image analysis technologies, including human detection and object detection, are used around the globe. However, progress has been slow in the wide adoption of such technologies, as simultaneous and parallel handling of multiple image analysis technologies requires massive calculation resource. The “image analysis technology that efficiently senses the real world” that NEC presents in this article can employ multiple image analysis technologies with a single engine with high accuracy and efficiency. We interviewed the researchers about the details of this technology that ultimately aims to build a human digital twin.

NEC Laboratories America
Department Head
Manmohan Chandraker

NEC Laboratories America
Researcher
Yumin Suh

NEC Laboratories America
Senior Associate Researcher
Turgun Yusuf

NEC Laboratories America
Senior Associate Researcher
Sparsh Garg

Combining different biometric authentication technologies to build a foundation for human digital twins

― What kind of technology is the “image analysis technology that efficiently senses the real world”?

Manmohan: It is a technology that analyzes many people captured in an image in real-time and with high accuracy. This technology efficiently integrates and simultaneously performs multiple tasks such as human detection and posture, clothing, hair color, and 3D analyses. The research paper that explained this technology was accepted for the 2022 Conference on Computer Vision and Pattern Recognition (CVPR 2022), a leading conference on computer science, where it garnered significant attention, as can be seen by our team being offered the opportunity to give an oral presentation of the paper, which is only offered to the top 4% of papers.

The use cases are varied: to give some examples, we are anticipating use in search for missing persons based on their clothing, hairstyle, and other characteristics, proposal of personalized support according to the attributes of people (age, with or without infants) visiting shops or theme parks, and service that seamlessly supports people from arrival at an airport to their boarding an aircraft.

In the first place, NEC is an enterprise that possesses a number of world’s number one technologies in the field of biometric authentication. Our team has been aspiring to develop “human digital twins” by expanding this strength. A “human digital twin” is a reconstruction of information in the digital world through sensing a human being. This goes beyond visual characteristics and aims to achieve the ability to also analyze and predict the individual’s behavior and preferences, which can lead to safety at public facilities and market measures.

Having said that, simply integrating individual tasks such as posture and clothing analyses is insufficient for achieving that goal. For example, the technology must be able to reliably identify and track the same person when that person moves between camera frames, tell what things are near the person and their relationship with the person, and if there are multiple people around, identify what they are trying to do. The technology must also deeply understand the relationship, or context, of such people, things, and the surrounding environment.

Additionally, consideration for the protection of privacy and confidential information is necessary for actual operation. Only with a guarantee of a reliable system can it be put into practice.

Our new development, the “image analysis technology that efficiently senses the real world,” can solve such issues and provide deep insight into people’s behavior from the images. It efficiently and safely integrates NEC’s high-precision analytics to present new insights beyond the current horizon of individual image analysis techniques.

  • *1:
    Ranked No. 1 in face recognition, iris recognition, and face recognition benchmark tests conducted by U.S. National Institute of Standards and Technology (NIST)

Achieving accuracy, efficiency, and robustness at a high level

― What was the technical breakthrough of this technology?

Yumin: Increasing the efficiency of computation was the key point. If we were to achieve efficient multi-tasking using existing techniques, the accuracy of the individual tasks would inevitably drop. Therefore, in order to run multiple tasks while maintaining accuracy, we would need to use a dedicated engine for each of the tasks. For example, when analyzing the skeletal structure, age, and clothing, we would have to prepare three engines with three sets of machine learning for analyses. Naturally, this would take time and effort, as well as eat computing resource.

Therefore, with this technology, we took the approach of improving the efficiency by finding common parts in the computing among the tasks while integrating the engines into one. This method distinguishes the parts that must be separated by task and parts that can be standardized, and optimizes those combinations. The development of this mechanism led to the success of achieving a high level of both efficiency and accuracy. As for accuracy, this technology can achieve an accuracy level higher than that of all single tasks combined.

The engines are unified into a single engine, which provides a huge advantage where all the data sets necessary for each task for teaching can be collectively input.


Turgun: What this technology does is real-time, multi-task image analysis, which is extremely high-load computing. This had major challenges that needed to be overcome in the engineering aspect. The first challenge was the instability in the behavior, specifically the differences in the number of attributes analyzed depending on each frame. For example, there was a case where the computing model that identifies seven attributes of people could only identify six in a certain frame. To address this problem, we introduced the latest tracking algorithm that minutely tracks the person in the image.

The other challenge was a user experience issue. In cases where there are dozens of people captured in the same image, without the attribute information appropriately indicated, we cannot tell which information is for which person. To solve this issue, we set up an information window on the right side of each person when tracking that person’s movement in order to improve the ease of user understanding.


Sparsh: As this is a complex, multi-tasking technology, there is a significant risk of the system going down. So I worked on research and development to enhance the robustness of this system. Specifically, we created multiple containers in the engine using open-source software Docker to design a system that can store tasks and techniques without mutually getting in the way of each other.

Even when there is a corrupted frame in the frame data itself, a processing-skipping function can keep the process running. Also, even if some cameras are intentionally removed, the system can continue operating without being affected. If the cameras are returned to their original positions, they can immediately get back to processing.

Eying the automation of computing resource distribution

― Tell us about the prospects of this technology.

Manmohan: While we have several visions, currently we are focusing on the implementation of elasticity. This means that the engine automatically determines the priority of multiple tasks and makes decisions on the optimal allocation and distribution of the computing resources. The target site for analysis is bound to have fluctuations in the number of people around and changes in necessary attributes depending on time. We are aiming to adapt to and support such conditions. We are also currently starting demonstration experiments.

The Future Creation Hub at the NEC Headquarters now has an environment ready where visitors can experience a demonstration of this technology. We hope to offer more opportunities to widely introduce this technology to people.

This technology uses a single engine to manage multiple tasks used for image analysis in order to efficiently perform the full range of processes from machine learning to analysis with reduced resources. As this enables high-accuracy, simultaneous handling of different image analysis technologies that were previously used independently, it is expected to contribute to the discovery of new insights by additionally taking into consideration the posture, behavior, relationship with the surrounding things and people, and other contextual matters.

  • The information posted on this page is the information at the time of publication.