Please note that JavaScript and style sheet are used in this website,
Due to unadaptability of the style sheet with the browser used in your computer, pages may not look as original.
Even in such a case, however, the contents can be used safely.

Speech Recognition

About Researchers

members

In this section, we talk to front-line researchers to learn about their individual backgrounds and research activities over the years. It is our hope that their message will help readers develop a greater understanding and appreciation of science and technology.
In this issue, we interview members of the research team headed by Senior Manager Akitoshi Okumura.
Akitoshi Okumura (center) Senior Manager
Kiyokazu Miki (back left) Principal Researcher
Ryosuke Isotani (right) Principal Researcher
Masahiro Saiko (front left)

Please tell us about your youth and student days.

Masahiro SaikoI also played guitar during my student days.

[Saiko] As a school boy, I was into a lot of things including riding my bicycle and creating my own maps of the neighborhood, taking pictures of trains, and playing the piano. I particularly came to like music due to my experiences with the piano. I used to play whenever I had time, and I had a feeling from early on that I would like to do something related to music in the future. However, this is not to mean that I was devoted to only music. In middle school and high school, I was also into basketball and other club activities, and at university, instead of attending a college specializing in music, I belonged to a research laboratory researching media coding with an eye to working in acoustics or audio technology. (Department of Communication Network Engineering, Faculty of Engineering, Okayama University). Then, when it came time for graduate school, I had also developed an interest in speech as a phenomenon originating in sound, and I entered the Graduate School of Informatics, Kyoto University and became a member of the Professor Kawahara Laboratory known for its research in speech recognition. By the way, the Common Platform Software Research Laboratories that I now belong to here at NEC includes a number of former colleagues of mine from the Professor Kawahara Laboratory.

[Miki] In elementary school, I enjoyed building radios, telescopes, and other equipment using supplements in student magazines, and I liked reading science fiction too. I was interested in the sciences from an early age, and in middle school, I became familiar with computers thanks to the microcomputer boom and developed an interest in artificial intelligence.

[Isotani] I liked math and science subjects from early on, and did well in them. These interests continued into college, where I selected the natural sciences and information sciences to study. Learning technologies and neural networks were popular fields of study at that time.

[Okumura] I enjoyed constructing things with my hands and drawing pictures from the time that I was small. I made my own kites and once constructed a bell by attaching a coil to a nail and forming an electromagnet.

What was your reason for entering NEC and choosing speech recognition research?

[Saiko] At an NEC company introduction, I had an opportunity to see a demo on technology for searching a cell-phone manual by simply speaking into the cell phone. It was then that I developed a desire to become involved in the R&D of speech interface technology and to enter NEC. I have since become in charge of R&D for speech recognition in call centers, just as I had hoped.

[Miki] A major factor behind my desire to enter NEC was a talk I had with Takao Watanabe, a senior researcher at NEC and now an executive expert in the Research Planning Division, at an academic conference. At that time, I was already familiar with a number of his accomplishments through journal papers and other sources, but it was only on meeting him in person that I realized I wanted to perform research under his guidance.

[Isotani] I had a desire from early on to enter a research laboratory that gave researchers the time to develop technology carefully. NEC was taking the lead in learning and recognition technologies. It had many senior researchers that one could learn from and had a good working environment providing much freedom in research. Although I visited a number of companies, I chose NEC with almost no hesitation.

[Okumura] My research in college was fault diagnosis systems, or in other words "expert systems". At the time, I felt that these systems could only do what they were programmed to do by code, and that they were hardly intelligent for all the effort that was put into them. Having an interest in language from early on, and enjoying the study of English, I came to wonder whether a system could be built that could represent ideas and knowledge on the basis of speech itself. With this in mind, I entered the world of natural language processing. I think there were two things that got me thinking about entering NEC: one was this interest in language-related problems, and the other was the proposal of a "speech translation telephone" as an NEC future objective by Dr. Koji Kobayashi, NEC's former distinguished chairman, who was much renowned for his proposal of C&C (Computers and Communications) in 1977.

C&C@International Telecommunication Exposition

What has been your most memorable event after entering the company?

[Saiko] Well, first of all, I noticed a big difference between research at NEC Laboratories and research in a university. Despite being a research institution, there's a sense of urgency at NEC Laboratories in achieving a level of quality that will make products attractive to customers. Also, about two years after entering the company, I suddenly felt a great desire to improve my ability at writing programs.

[Miki] After entering the company, I was soon put to work supporting the commercialization of SmartVoice speech recognition software. My experiences at that time turned out to be a great asset for me. While in school, the research that I was involved in focused on the processing and recognizing of pre-established words and grammar. But in the process of commercializing SmartVoice, a dictation product, we had to develop a process that could recognize any spoken statements without deciding on grammar beforehand. It was a great feeling when a program that I wrote ran well for the first time.

[Isotani] I also had a dramatic experience working on the commercialization of SmartVoice, and I can never forget the demonstration on automatic interpretation given at Telecom91 in Geneva. Nowadays, interpretation devices have become as small as the palm of your hand like the VoToL personal media player. But at the time of Telecom91, our equipment was as large as a refrigerator requiring backup equipment as well. Despite its size, it could only recognize a limited vocabulary of about 500 words, and it required several seconds to process an utterance and output recognition results. Nevertheless, visitors to the conference were quite impressed with the demonstration. There were even people working in interpretation that suddenly started to worry saying "What will become of my job now?" We actually spent several months before the exhibit engaged in very difficult development work and arrived in Geneva early to set up and prepare our equipment. On the day of our demonstration, we were both mentally and physically exhausted, but the great response we received from visitors was extremely encouraging.

Mr.Isotani, Dr.Okumura

[Okumura] I also have many memories of Telecom conferences, but what clearly stands out in my mind is the day that we successfully operated the PIVOT translation system for the first time. At that time, Saturdays and Sundays were also devoted to R&D at NEC. One day, after finally completing our program late at night, we tried inoutting the statement "This is a pen.” into our VAX mini-computer. We began to wait, but after about 30 seconds―which felt like an eternity to us―the result “Kore wa pen de aru.” appeared on the screen in Japanese. Needless to say, this was a very happy experience. At that time, though, we would only have the machine translate sentences in English. Nowadays, with translation results being displayed in nearly real time, I sometimes feel like I’m living in a different time.

Please tell us what you have learned from interdisciplinary and intergenerational exchanges.

[Isotani] Our department also includes researchers specializing in language processing. The culture of people specializing in speech research differs from that of people specializing in language processing, as does the terminology that they use. At first, there was much bewilderment, but by holding discussions and moving forward on research together, we gave birth to new technologies in interpretation and speech-based searching and enabled speech recognition technologies themselves to evolve. We were also able to apply learning theory, one of NEC's strong fields, to the field of speech recognition. I believe that this skillful convergence of fields has led to original speech recognition technologies, our automatic interpretation technology in particular. Also, in terms of making products, a powerful device is an absolute necessity. We have developed technology that can perform speech interpretation on a cell phone, but large-vocabulary speech recognition processing on a handset itself without the aid of a server would not be feasible, I believe, without a high-performance chip like the MP211 and a close working relationship with the developers of that chip. As far as intergenerational exchanges, mainstay researchers like me can be stimulated by the activities of young researchers. They are having fun writing a blog.

Kiyokazu MikiAt Nara Park, Sarusawaike Area

[Miki] When commercializing our technology, we would deepen our relationships with the business department and marketing staff. VisualVoice is a product born from such collaboration with those groups. Furthermore, in the process of examining our technology closely, I could feel that we were getting many valuable hints from senior researchers. The history of speech recognition research at Central Research Laboratories is old and the wisdom of senior researchers gets passed on to the younger generation. I believe that speaking with Takao Watanabe, a top author of journal papers at NEC, was a profound experience that prompted my entry into the company.

[Saiko] I feel the same way. At any rate, I'm still gaining a variety of experiences in my work.

[Okumura] NEC recently issued a press release on technology for creating blogs by robots. In the evaluation of this technology, we listened to the opinions of researchers belonging to the younger generation. This is because researchers of our generation could not provide convincing opinions on the quality of blogs created by this technology. We need assessments from many people who come in contact with new media like daily blogs. In this sense, I feel that the viewpoints of young people are vitally important in the development of new technologies.

Please tell us something about future research issues and objectives.

[Saiko] I would like to focus my attention on what kind of technologies and applications are needed to get customers to use our products and what kind of approach should be taken to meet their needs. In any case, I would like to capitalize on my sense of hearing that I've cultivated through music to take on challenging themes like the analysis of emotional expression in speech.

[Miki] Up to now, I've been researching language models for word dictionaries used in speech recognition and have applied the results of this research to call-center applications. But in actual use, terminology that needs to be added to dictionaries tends to increase on a daily basis, and right now I'm working on what might be the most efficient way of making such updates. At present, dictionary "tuning" is done manually, but I would like to create a system that can make these updates automatically.

[Isotani] Speech recognition technology has improved dramatically since I entered the company, but it must be said that the accuracy of speech recognition varies due to a variety of factors including subject matter, way of speaking, ambient noise, and type of microphone. At present, I'm thinking about evaluating the actual extent to which speech recognition can be achieved under different conditions. Customizing speech recognition for each set of conditions would be troublesome. For the future, I believe it will be necessary to develop technology that can tune speech recognition according to ambient conditions, speaking style, etc., even with just a small amount of samples. I also think that an issue that we should always be thinking about is what sort of applications can employ speech recognition technologies to good effect. Right now, we are devoting our energies to applications like call-center speech recognition and speech interpretation, but there are still applications out there that could use voice as an interface. What I would like to do is apply speech recognition technology to a much broader range of applications while combining it with other powerful technologies at Central Research Laboratories like data mining and language processing.

[Okumura] Our mission is to develop technologies, such as "communication agents", that will deepen mutual understanding between people and fill communication gaps. To succeed, we need to consider three things when conducting research. First, we need to think of how to express information such as a person's knowledge and profile. Next, we need to think of how to accumulate this information automatically as data. Then, once we are able to express and accumulate the information, we need to think of what will enable us to fill the gaps -what needs to be implemented between one person and another. One promising way to fill the many gaps is speech recognition technology.

Can you leave us with a message for young people pursuing science?

[Saiko] I think motivation is of primary importance. Of course, having an interest in the sciences is a necessary precondition, but having an urge to ask "Why?" is very important. Asking simple questions like "Why can I talk with a robot?" and "Why cantalk with a robot make witty conversation?" is the path to finding a research theme.

[Miki] There are arguments for and against the "cramming" approach to learning. In high school and college, I would run through this book and that book, and I would say that cramming is good as long as some knowledge is absorbed. At the time, I may not have known what that knowledge that I was cramming in was good for, but I thought that it was surely useful just the same. Having such an abundant amount of knowledge would certainly lead to something that would hold my interest, and at the same time, I thought that I could incorporate that knowledge whenever I was interested in something.

[Isotani] Though somewhat different from cramming, I believe that learning a wide variety of things is no doubt necessary. I have come to feel that a diverse array of knowledge is one important element of speech recognition research. I also believe that being tenacious in exploring the true nature of things is very important. For example, when applying a certain method to a certain object, it is important to consider why the results turned out like they did whether or not they were good results, and what features of the applied method matched or failed to match what properties of the object. Analyzing and examining such things should give birth to the next new idea!

[Okumura] When talking about people in the sciences and arts, it is sometimes said that the sciences consist of people that like to compute and prove things for themselves, while the arts favors people that are good at acquiring a wide range of knowledge. Yet, in the field of language processing, for example, a person that can think logically to solve problems can be quite an asset regardless of whether that person has a science or art background. In any field, moreover, a person that is only good at solving problems that are given to him will not be particularly useful. In short, a background in the sciences or arts is irrelevant -what is really important, I believe, is the ability to discover problems and to figure out what to do on one's own to uncover the nature of the problem and to solve it. A person that repeatedly tries to solve problems- whatever their nature- on one's own without going directly to someone else for an answer will have many experiences that cannot help but be useful in future work.