Data Understanding with Semantic Technology that automatically understands the meaning of data and accelerates data integrationFeatured Technologies
August 5, 2019
Data integration previously required many hours of work by data scientists with specialized knowledge. "Data Understanding with Semantic Technology" can automate this type of work. We spoke to two developers about the details and future vision for this technology.
Innovative reductions in the cost of data integration which is essential to the introduction of AI
― What kind of technology is data Understanding with Semantic Technology?
Oyamada: It is a technology that can automatically infer the meaning of tabular data which includes textual and numerical columns. For example, it can infer that the numbers written in the first column are ages, and the data listed in the second column are names. One might think that you could just look at the headers which explain each of the data types at the top of the table. However, when you check the tabular data carefully, most of the tables follow very different naming conventions depending on the data. The naming conventions of tabular data strongly depend on the people who prepare the data, the company to which the people belong, and the country where the company is located. Therefore, in many cases you are unable to find the desired information with a search, and the intervention of a human being is required to accurately understand the meaning of the data. It costs a considerable amount of time, effort and money to understand the meaning of the data held by companies and government offices, as they contain a massive number of rows and columns. You have to carefully decipher each and every one.
Takeoka: In truth, as researchers we have received various sets of data for analysis from the customers. It usually takes about one to two weeks of work to decipher the data and get it into a properly organized state. In addition, we then submit the prepared data to the customer for feedback in a repeated process, so the work usually requires about three to four weeks. If we utilize this new technology, the work can be completed in just one day.
Oyamada: In order to utilize AI technology to create new forms of value in the coming society, the work to aggregate multiple types of big data and integrate them into one is essential. Using this new technology will allow us to automatically execute the work at high speed and with high accuracy. This will possibly make lateral data integration between companies, groups, and municipalities much smoother and significantly reduce the time and human cost required to introduce AI and big data analysis.
Utilizing the table structure and knowledge graph to achieve high accuracy inference
― What type of technology is the data Understanding with Semantic Technology based on?
Oyamada: It is essentially based on machine learning. By designing our own learning apparatus and repeatedly running it to alternately infer the meaning of numbers and text, we can increase the performance while mutually feeding back the prediction results. In the past, there were also technologies for inferring the meaning of only text data or numerical data, but the key point of this technology is that it can perform an analysis even when numerical values and text are mixed together. As a result, the accuracy has dramatically increased, and the application possibilities have significantly expanded. Moreover, there are already technologies for integrating data on the data integration side, but these technologies determine the relative similarity of data sets. They differ from this new technology which is able to infer and integrate the meaning of data. One can say that data Understanding with Semantic Technology goes a step deeper.
Takeoka: A "knowledge graph" is utilized to infer the meaning. This is a database which ties together various words based on attributes and the strength of association, but effectively utilizing and associating this graph with the structure of the tabular data achieves a high accuracy semantic inference. For example, let's say we have the numbers "28", "29", and "30" in a row. What do you think those numbers represent? One person might think that they represent ages while someone else might think they are temperatures, and another person might think they are year numbers. It is difficult to figure out the meaning just by looking at the numbers. Therefore, this new technology pays attention to the types of data which exist in the same table. As a result, if other cells contain text which it determines are place names and there is a "°C" symbol indicating temperature, then this technology determines that the numbers represent temperatures. One might say that we are doing the same thing as a human being.
Oyamada: "Knowledge graphs" are created by various companies, but the NEC knowledge graph is particularly deep when it comes to technical terminology, because incorporating various kinds of data from our social solution business has enriched "Knowledge graphs." By combining this graph with knowledge graphs that include many general concepts widely used in society, we can perform high accuracy inference in various domains even with specialized data.
A research paper summarizing this technology was accepted by AAAI 2019, one of the top international conferences in the field of artificial intelligence. This was the first time that Mr. Takeoka had a research paper accepted by an international conference. To tell the truth, he participated in this project when he came to NEC as an intern. It was just for one month, but he obsessively wrote code from that time and got it to the point where it was actually working.
Takeoka: Yes, that's true. And then you said, "It's a shame, since you accomplished so much." When I spoke to my professor about the project, he said, "Well, then let's make it the topic for your master's thesis", and the university signed a joint research agreement with NEC. So I graduated and continued the research after joining the company and was able to achieve results.
A completely new data platform that can link and integrate data throughout the world
― What kinds of applications are you thinking of for this technology?
Oyamada: Presently, we are thinking of applying it to data lake products used within companies. Unified management of data within companies will create new forms of value starting with "data democratization." In the past, a human being had to intervene to recognize and connect each data set, but we believe that being able to implement this intelligently will provide a massive benefit to society.
However, if I can speak more candidly about the future outlook and ultimate vision for this technology, we are thinking of a world in which all of the data in society is integrated as one. If the data becomes integrated within companies, between companies, and even between countries, we will absolutely be able to discover new forms of value which we never would have thought of before. We are thinking of building this kind of completely new platform, and we believe that this technology will provide a new approach. Of course, this technology still needs work, so we are currently working very hard to fill in the gaps.
Takeoka: Yes, that's right. In addition, this is a fundamental technology that can be used in various ways. There might be application examples that we have not even thought of yet. We would like to discover new solutions in discussions together with many different people.
Oyamada: Ultimately, we would like to turn this technology into something like facial recognition technology. Increasing the accuracy of facial recognition is extremely difficult, but in terms of the result that it provides, it is an extremely simple technology that identifies a person from a facial image. However, this technology is currently used by people around the world to automatically log into PCs and make automatic payments at stores. It has truly created many application examples. In a similar manner, we would also like to create many applications together with various people by providing a module that can "understand the meaning of data" and say, "look what you can do if you use this."