NEC uses generative AI (LLM) and video recognition AI to automatically generate explanatory text from video- Applied to drive recorder videos, cutting accident report creation time in half -
Tokyo, Japan, December 5, 2023 — NEC Corporation (NEC; TSE: 6701) has developed the world's first technology (*1) to integrate a generative AI large language model (LLM) and video recognition AI together in order to automatically generate shortened videos and explanatory text from long videos.
By analyzing drive recorder videos with this technology, it is possible to automatically generate text and shortened videos explaining the circumstances of an accident and how it occurred. Based on the text and video, an accident investigation report can be automatically created in a format that is appropriate for non-life insurance claims and traffic safety instructions. NEC plans to offer a trial version of this technology in March 2024.
In recent years, video has been increasingly utilized for the purpose of safety management and operational efficiency in a variety of industries, including transportation, logistics, manufacturing, construction and retail. However, it takes an enormous number of hours to manually check long videos and create reports on near-misses and areas for improvement. Although it has become possible to generate explanatory text for still images using generative AI (image-to-text), it has been difficult to apply such image-to-text technologies to videos containing complex scenes that consist of various objects and environments and that change over time.
Features of the technology
1. Find scenes efficiently and create reports faster
The combination of video recognition AI and LLM makes it possible to understand each scene in a video. Specifically, more than 100 video recognition AI engines are applied to recognize the various objects and environments that make up a scene, such as people, cars, buildings, animals, trees and other natural objects, and the weather, as well as their changes, individually. By using LLM to analyze only the recognition results, users can find the scene they are looking for more efficiently than when analyzing an entire video, eliminating the need to repeatedly check a video.
2. Accurate interpretation of video context to generate expert-quality reports
To improve the quality of the generated text, the LLM is pre-finetuned using sample videos from a specific domain. For example, when applied to drive recorder videos, road traffic-related videos are analyzed in advance. This gives the LLM the expertise to correctly understand what happened in the video. As a result, it is possible to create highly reliable reports while addressing hallucination (*3), which has been an issue with the accuracy of generative AI.
3. Generate reports in seconds without large computing resources
This technology can create a video of a desired scene and explanatory text in a few seconds from a video that is over an hour long. To achieve this, NEC integrated a compact, high-performance LLM and a high-speed data retrieval system developed by NEC.
NEC verified this technology in a use case of creating accident investigation reports from drive recorder videos. As a result, by automating the search for accidents and the scenes that caused them, and the creation of report drafts, which had previously been done manually, the time required to create reports was cut in half.
In March 2024, NEC plans to begin offering a trial version of this technology to non-life insurance companies and automobile manufacturers to support the preparation of accident investigation reports and other documents that utilize drive recorder videos.
In the future, this technology will be deployed in various use cases, including support for creating nursing and care records, support for creating work records at manufacturing and construction sites, creation of explanatory text to be learned by AI for autonomous driving, as well as collection of specific content for broadcast videos and creation of voiceover scripts.
- (*1)As of December 5, 2023, according to research survey.
- (*2)Video source: https://www.youtube.com/watch?v=YBbutvif1W8
- (*3)Hallucination: A phenomenon in which the generative AI outputs incorrect information in a plausible format.
About NEC Corporation
NEC Corporation has established itself as a leader in the integration of IT and network technologies while promoting the brand statement of “Orchestrating a brighter world.” NEC enables businesses and communities to adapt to rapid changes taking place in both society and the market as it provides for the social values of safety, security, fairness and efficiency to promote a more sustainable world where everyone has the chance to reach their full potential. For more information, visit NEC at https://www.nec.com.
NEC is a registered trademark of NEC Corporation. All Rights Reserved. Other product or service marks mentioned herein are the trademarks of their respective owners. © NEC Corporation.