Breadcrumb navigation

NEC develops AI-based "Data Understanding with Semantic Technology" to infer the meaning of diverse data

Enables precise industry-wide data integration and searching

Tokyo, August 5, 2019 - NEC Corporation (NEC; TSE: 6701) has developed a "Data Understanding with Semantic Technology" that uses AI to infer the essential meaning of diverse types of data. This technology automates, at high speed and high quality, the consolidation of multiple tabular data from different fields and industries, a task that conventionally takes experts a significant amount of time to perform.

This technology is a proprietary machine learning algorithm that is part of NEC's portfolio of cutting-edge AI technologies, NEC the WISE(*2), and makes use of tabular data structure and knowledge graphs(*1) that link various words and numerical qualities. For example, a sequence of numerical data in a table (e.g. {29, 24, 23}) could suggest various meanings when taken separately (e.g. "age" or "temperature"). If the same tabular data, however, includes a column for "name," then we can infer that the numerical data pertains to "age" rather than "temperature" based on the strength of the semantic relationship.

When applied to open data(*3), it took just one hour, but with the same quality, to perform data integration work that would have taken experts 30 days to complete.

Going forward, NEC aims to add this technology to the supply chain and to make it available as a general-purpose tool for information-sharing platforms, including so-called data lakes (databases that store data in various disparate formats), data management platforms (DMPs(*4)), information banks and data distribution platforms. NEC will also continue with research and development in this field.

Background
In recent years, a great deal of effort has been put into systems such as data distribution infrastructures and information banks where data is shared and integrated between departments, businesses, and even industries, thereby facilitating cross-cutting analysis at an unprecedented level.

Cross-cutting analysis of data possessed by different entities, however, requires integration of diverse data that have unstandardized table and column names, which are in fact written in largely different ways among users, businesses, and industries. As such, data management experts needed to manually integrate vast amounts of tabular data by examining them to determine what the data are for and what the columns and rows represent. This has led to many problems, such as the large amount of time needed for data integration, the inability to start analysis immediately, and the deterioration of analytical precision due to variability in operators' skills.

Features of the new technology
This new technology is an NEC proprietary machine learning algorithm that performs the integration of diverse data at a quality comparable with that of integration performed by experts but within a much shorter period of time, making it possible to significantly streamline data integration tasks.

  1. Linkage with knowledge graphs based on feature vectors that can express the trends in data distributions

    Instead of relying on the original table names and column names, the new technology infers the meaning of data based on the statistical tendencies of the numerical distribution of each sequence of data.

    In particular, it collects beforehand the numerical values that co-occur with each word in the knowledge graph and creates a unique knowledge graph that includes the numerical distribution of the words. It then calculates feature vectors that indicate the distribution tendencies of the frequency of occurrence of numerical values in the numerical data sequence, based on the similarity of the statistical distribution tendencies of numerical data that have the same meaning. Comparing the feature vectors with the numerical distribution of each word in the knowledge graph enables inferring meanings (e.g. "sales volume") even for data without column labels.
  2. Using co-occurrence relationships of meanings in the knowledge graph to achieve highly accurate semantic inference

    A sequence of numerical values in a table, such as {29,24,23} for example, taken separately could point to various meanings such as "age" or "temperature," making it more difficult to correctly infer their meanings than for text data sequences.

    NEC's new technology accurately infers meaning by making use of network distance (= strength of co-occurrence relationships between meanings of data) to infer the co-occurrence relationship between the "possible meanings of the data sequence in question" and the "meanings of other data sequences in the same tabular data." For example, for a certain numerical data sequence, if a column for "name" is included in the same tabular data, then from the knowledge graph it is assumed that the data sequence refers to "age" rather than "temperature" based on the strength of the semantic relationship.
    On January 30, NEC presented this technology at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), which was held in Hawaii from January 27 through February 1.
    URL: new windowhttps://aaai.org/Conferences/AAAI-19/

***

  • (*1)
    Knowledge graph: A database expressing the meanings of various words as a network
  • (*2)
    "NEC the WISE" is a term for the company's cutting-edge portfolio of AI technologies.
    Press release:
    NEC announces new AI technology brand, "NEC the WISE"
    https://www.nec.com/en/press/201607/global_20160719_01.html
  • (*3)
    Open data: Uses publicly available sensor information and data from many different industries such as medical information.
  • (*4)
    A DMP is a platform for collecting and managing information and data sourced from disparate systems.
About Data Understanding with Semantic Technology

About NEC Corporation
NEC Corporation is a leader in the integration of IT and network technologies that benefit businesses and people around the world. The NEC Group globally provides "Solutions for Society" that promote the safety, security, efficiency and equality of society. Under the company's corporate message of "Orchestrating a brighter world," NEC aims to help solve a wide range of challenging issues and to create new social value for the changing world of tomorrow. For more information, visit NEC at https://www.nec.com.

Orchestrating a brighter world

NEC is a registered trademark of NEC Corporation. All Rights Reserved. Other product or service marks mentioned herein are the trademarks of their respective owners. © NEC Corporation.