A developer of the Recognizing Textual Entailment technology that was ranked top of the world
Ph.D. (Engineering), Chief Engineer, Big Data Strategy Division / Knowledge Discovery Research Laboratories, NEC Corporation: Masaaki Tsuchida
Recognizing Textual Entailment technology can grasp the meaning of a sentence and assess and recognize it accurately and quickly, independent of differences in expression. This infrastructure technology for processing large batches of documents by scrutinizing the meaning of their content is yielding results in areas such as the bolstering of information governance and utilization in marketing. Masaaki Tsuchida, who developed text analysis technology with world-leading accuracy*, talks about the hurdles he had to overcome during development, as well as his dedication to making it the best in the world. He also discusses what he focuses on as a researcher.
- * In evaluation tasks organized by the National Institute of Standards and Technology (NIST)
Understanding the meaning of a sentence independent of the words within
--First, can you give us an overview of the sort of technology Recognizing Textual Entailment is?
Tsuchida: Before I explain Recognizing Textual Entailment, let me discuss the difficulty associated with analyzing text from a computer perspective.
When conveying a written message to another party, as humans we have a wide range of expressions to choose from, even though some are synonymous. Let me give you an example. Suppose there were three sentences. The first reads, "The engine stopped all of a sudden." The second reads, "The motor cut out abruptly." The third reads, "The engine gave a sudden cough."
A human can tell that the first and second sentences mean the same, while the third has a different meaning. On the other hand, if a computer were to determine this based on the similarity between words and letters, it would assume the first and third sentences had a similar meaning due to the fact they both contain the words "engine" and "sudden," while it would consider the second sentence as unrelated in meaning because it doesn't share any of the same words. As this demonstrates, when processing based on similarities between words and letters, it is difficult to perform analysis contingent on the human understanding of meaning.
The Recognizing Textual Entailment technology we developed can identify by sentence the semantic entailment relationship of what is written, independent of differences in expression. This closes the gap between the semantic interpretation of humans and machines, enabling the extraction of text containing specific meanings from large sets of documents, and giving an accurate understanding of trends such as the kind of content that appears frequently or infrequently.
--What are the features of Recognizing Textual Entailment technology?
Tsuchida: Basically, the fact that it is able to recognize the semantic entailment between sentences both accurately and swiftly. The Recognizing Textual Entailment technology NEC developed was rated as the most accurate in the world in evaluation tests carried out in the United States in 2011.
While maintaining this high level of accuracy, NEC has also diligently pursued the acceleration of processing. The technology now boasts world-class speed, after cutting document processing time from the previous 1.7 hours to just seconds now. This makes it possible to carry out the process of grouping based on sentence meaning with high accuracy and in a realistic amount of time, so analysis can be performed to extract text that includes content of a specific meaning from a large set of documents, or to ascertain how much is written about a given topic.
I was given the mission of attaining world-leading accuracy
--Please tell us about your role in the development of Recognizing Textual Entailment technology.
Tsuchida: Recognizing Textual Entailment technology was under research and development at NEC when I was put in charge from fiscal 2011. Since then I have led the research and development as a core member, handling everything from algorithms to its practical application. The mission I was initially given in 2011 was to attain world-leading results in a workshop for evaluating textual entailment recognition.
We felt that speed was as important as accuracy when taking practical application into account, so based on the expertise and knowledge that NEC had accumulated up to that point, we came up with the idea for two-step assessment that satisfied both accuracy and speed requirements. As a result of developing this system, we took the number one spot without a hitch.
--What sets NEC apart from competitors when it comes to the development of Recognizing Textual Entailment technology?
Tsuchida: A range of research institutions are working on textual entailment recognition technology, but because NEC is a corporation, we needed to develop it as a practical technology. In other words, our objective was not to win acclaim through research papers or evaluation workshops, as it was important for us to build technology that could actually be put to use.
To achieve this, high speed as well as accuracy is crucial.
We implemented this using two-step assessment, which I touched upon earlier. The first step involves a rough assessment of whether or not there is a textual entailment relationship on a word level, considering synonym dictionaries and word importance. Because this process can be carried out swiftly, anything not likely to have entailment ends as "non-entailment," making high-speed processing possible in many cases. When there is potential entailment, the structure and semantics of sentences, including the subject and predicate, as well as positive and negative forms, are examined in the second assessment step. This strikes a balance between speed and accuracy.
We have also developed technology for the semantic matching of large quantities of text data using this Recognizing Textual Entailment technology. More specifically, we developed technology for the high-speed searching of sentences that entail text input from a text database, while retaining the accuracy of two-step assessment. Next, we developed an entailment clustering method using this high-speed entailment search technology to find groups in the text that include the same meaning, and then apply a label indicating the meaning of that group.
Attaining the top ranking had a significant impact on the company
--What struggles did you face during development of the Recognizing Textual Entailment technology?
Tsuchida: Well, we are now using two-step assessment, but for the first step of assessment we knew that despite the NEC expertise we had access to at the time, that alone would likely not be enough to raise accuracy. In light of this, we initially went ahead with development using only what is now the second step of the assessment process, but this approach also proved challenging. Ultimately, we came to the conclusion that a two-step assessment process would work best, but there was a lot of trial and error involved.
With two-step assessment, the accuracy and processing details differ depending on how you configure the acceptance/reject level for step one and step two, so we also adjusted the balance of roles and algorithms for each. Furthermore, it is normal for errors to occur in language analysis, and in general there is no perfect solution, so it is also difficult to isolate problems. Consequently, working out which module to improve to raise accuracy was a continuous process of trial and error.
--What sort of reaction and feedback did you get when you attained world-leading accuracy?
Tsuchida: First, I remember the reaction within the company was significant. When people hear you've come up with technology that's the best in the world, it has a lot of impact, which I think comes down to customers feeling we had something new to offer them. I'm sure there were also business inquiries from customers who had questions to ask, of course.
Because textual entailment recognition is a versatile technology rather than technology aimed at a specific application, after people learned the details there were many questions pertaining to what the technology can achieve, and what exactly it is useful for. Although we had quite a few ideas, we were called upon to demonstrate utility through practical application, so from the year after we attained the world's top ranking we devoted our efforts toward substantiating this technology with an eye toward business utilization.