A world expert in acoustic signal processing supporting the evolution of smartphones and audio players
Ph.D. Engineering, Research Fellow, Information and Media Processing Laboratories, NEC Corporation: Akihiko Sugiyama
The "MPEG" audio compression technology is a global standard that many people will have heard of before. Akihiko Sugiyama has been deeply involved in its technical development and standardization activities since development of the first "MPEG-1" algorithm.
As an audio signal processing specialist with over 30 years of experience, he has also worked on projects such as noise removal technology essential for smooth communication and speech recognition. Here he discusses the history of MPEG development, and the fascinating mechanism behind noise removal, while also sharing his thoughts on what he places emphasis on in his research.
Sound is the basis of communication
--Can you tell us in plain terms what kind of research you specialize in?
Sugiyama: In short, I'm a researcher of acoustic signal processing. Sound plays a fundamental role in communicating your will and feelings to other people. I conduct a variety of processing and research aimed at making the sounds you hear in a range of places pleasant and natural.
When talking with another person over the phone, for example, preprocessing technology is required to remove the noise and other elements you don't want transmitted, and enhance the voice you want the other party to hear.
Next, when storing audio information in memory or uploading it to a network for delivery, reducing the information content of the audio enables it to be sent faster and cheaper, and makes it available to more people. Techniques for making audio available to hear faster, cheaper, and in higher fidelity are the main processes that this kind of compression technology involves.
In addition, when receiving a voice call on a mobile phone or other device, post-processing technology is also important for making the audio clear on the recipient side and helping it be heard as intended.
In this respect, my job is to provide total support for pleasant audio-based communication, whether "person to person" or "person to machine."
Cutting unnecessary noise for smooth audio calls
--Can you start by explaining exactly what NEC's acoustic signal processing technology is?
Sugiyama: NEC has advanced and varied acoustic signal processing technology. This collection of technology is named "EuphoMagic(R)," as a general term for technology that makes signals easier to hear. "Eupho" refers to the euphoria when listening to high-fidelity audio, while "Magic" refers to how it transforms audio like wizardry. It is of course implemented in our own devices as well as those of competitors, and it improves the quality of calls and recordings for speech recognition or phone conversations. Let me discuss EuphoMagic briefly.
First, "noise suppressors" and "noise cancelers" are techniques for removing the noise that gets mixed into voice or music audio without damaging them.
Noise suppressors isolate and suppress the noise components that get picked up when voice or music is recorded on a microphone, emphasizing the vocal components. To achieve this, it is necessary to estimate the noise component, and the accuracy of this estimate has a significant impact on performance. This means that the higher the ratio of noise to voice or music, the more performance deteriorates.
Noise cancelers are techniques for carrying out voice or music enhancement even in environments with more noise that noise suppressors can handle, and they use two microphones to eliminate noise. Put simply, one of the microphones is used for noise, and as the sound picked up on it is assumed to be noise, this is subtracted from the other microphone used for voice audio, enabling clear calls and recordings.
Other technology for canceling the echoing that occurs when audio from speakers feeds back into the microphone during hands-free calls has been installed in smartphones and PCs.
--What specific effects or benefits does the EuphoMagic technology developed by NEC provide?
Sugiyama: It makes signals for the purpose of voice communication easier to hear by reducing the ambient noise in the background during smartphone calls, video chat such as Skype, voice recorders, and video recordings, and it also mutes tapping sounds or keyboard presses when controlling devices. In addition, it can help mitigate mechanical noise from a digital camera's lens drive or focusing movement, while also reducing wind noise sufficiently. These technologies have been applied to NEC's mobile phones, as well as consumer devices such as digital cameras made by other companies. NEC technology is also used in that voice recorder you are recording my interview on.
In addition, this technology is assisting the highly accurate speech recognition used with car navigation systems and NEC's Communication Robot PaPeRo.
Contributing to development of the MPEG international standard
--Next, please tell us about the audio signal compression technology that it could be said is the main part of processing.
Sugiyama: Audio signal compression technology takes advantage of human auditory characteristics. Information content is reduced by eliminating the high-pitched sounds and faint sounds that human ears can't hear. Also, because human ears are particularly bad at identifying different frequencies in the high range, they can't tell the difference between subtly varying frequencies in high-pitched sounds. Eliminating all but one of these frequency components that can't be discerned between also reduces the information content. Furthermore, quiet sounds in frequencies close to louder sounds tend to get drowned out by the latter, so they can't be heard. A technique called masking is used to remove the inaudible components, further reducing information content.
Another method used is information localization. Applying a certain mathematical conversion to an audio signal leads to the signal concentrating in one place, resulting in an information bias. Areas where the signal is not concentrated are blank and contain no sound, so cutting these makes a drastic reduction in information content possible. A variety of other detailed processing is carried out, but this gives you a general idea of the methods by which audio signal information is compressed.
--Tell us about the MPEG compression technology that we hear about a lot.
Sugiyama: MPEG began from the desire to record multimedia data such as video and music on CD-ROM media at a high quality, for playback on video game consoles.
When ISO/IEC* started working toward international standardization, researchers and technical experts from well-known companies and universities around the world took part, contributing their most worthy technology. For the audio category, after several meetings, the technology contributed by each organization was brought to Stockholm, Sweden, for evaluation.
The evaluation results showed that none of the technologies excelled in all of the areas assessed, so the decision was made to integrate the best technology from each company to establish a global standard. Thus began the first generation "MPEG-1" compression technology that reproduces audio quality equivalent to a CD sound source using 1/8 to 1/12 the information content.
Our "adaptive block length conversion encoding" technology was also adopted. This reduces noise to 1/10,000 of the original amount by switching the block length (the number of signal samples processed in one block) according to input signal characteristics.
Compression technology subsequently continued to evolve based on the demands of the time, but adaptive block length conversion encoding has been adopted in each generation to date. This technology is implemented on all devices that have multimedia functionality, such as mobile phones and smartphones, PCs and tablets, and DVD players. It facilitates smooth communication, and supports services such as downloadable music and ringtones. Of course, the iPhone and iPad are no exception.
ISO stands for the International Organization for Standardization
IEC stands for the International Electrotechnical Commission
--Can you talk about the fully semiconductor-based audio players utilizing compression technology that NEC was the first in the world to develop?
Sugiyama: One of the most well-known digital audio players that uses semiconductors as recordable media is the Apple iPod® that was released in 2005*. We had succeeded in developing the world's first fully semiconductor-based portable audio player in 1994, ten years earlier.
The publicity generated by this was astounding, and it was introduced to a wide audience as the audio of the future by major newspapers such as the New York Times, as well as radio stations and magazines. I have fond memories of appearing on a live news broadcast at U.S. NBC's studio in Kojimachi that night. Our product even made the pages of U.S. Time Magazine.
This prototype device, for which we obtained the trademark "Silicon Audio®," actually began as a hobby project of one of the team members. However, soon after beginning development without the section chief's knowledge, that team member took off to go and study in America. In the end, I was left with the job of dealing with the aftermath and putting it out into the world. It was good how someone who follows their heart when it comes to their hobbies could make it as a researcher without any trouble back in those days.
- *Hard-disk drives (HDD) were used until the iPod nano was released in 2005. Since then, the use of flash memory has become mainstream.
Audio signal compression technology tricks your ears and brain
--How has MPEG evolved since then?
Sugiyama: The main application for the second generation "MPEG-2" was digital radio broadcasts in Europe. There was demand for technology that could compress audio signals efficiently, and enable as many channels as possible to support multiple languages or 5.1-channel audio.
However, the "MPEG-2/BC (Backward Compatible)" algorithm that was developed initially didn't have good audio quality. Naturally, European broadcasters requested that a new standard be established. Reflecting upon the fact that giving priority to compatibility with MPEG-1 had led to subpar sound quality, the technology for the new MPEG-2 was revised from scratch. As a result, the MPEG-2/AAC codec was realized in 1997, and this could easily maintain the standard of broadcast quality needed. It featured even higher compression, down to 1/16 of the original audio. This codec still offers high fidelity by today's standards, and it has been used for satellite broadcasting and digital television broadcasting in Japan.
Heading into the 21st century, mobile communications caused a major upheaval around the world. "MPEG-4" was developed to further improve compression, while maintaining high audio quality on compact portable devices as well as PCs. The audio standards weren't far removed from MPEG-2, but they were fine-tuned to keep quality from dropping even when heavy compression was used.
The codec that dramatically improved compression with regard to audio is "MPEG-4 HE-ACC," an evolution of MPEG-4. Providing the rapidly growing number of mobile phone users with services using audio signals, such as ringtone distribution, requires compression that transcends fundamental principles. Reducing power consumption is another important issue for mobile phones.
To resolve these issues, the information content is cut in half, leaving only the low frequency components. When restoring the signal, these are copied to the high frequency band, producing a good sound with balanced low frequencies and high frequencies. We went back and revisited these principles from a fundamental signal processing perspective, and came up with a revolutionary compression method that achieves equivalent audio quality with just half the computational effort, leading to its adoption as an MPEG standard.
In this way, the third generation MPEG-4 HE-ACC audio codec, which features compression at the unheard of rate of 1/32 in addition to low power consumption due to the low computational load, is an indispensable tool for reducing music download times, enabling the storage of lots of music, and allowing for mobile phones that don't require charging for a long time. This technology changed our lives, enabling the download of full versions of ringtones that include vocals, and the storage of large amounts of high-fidelity music.
Looking at how audio signal compression has evolved since the technology's sudden rise to prominence in this way, you can understand how the development of new methods and technology also involves the challenge of figuring out how to trick peoples' ears and brain.
--Tell us about your involvement with MPEG up until now.
Sugiyama: I've been involved at every level, from technological development to international standardization activities, as well as the release of papers and books. I also handled awareness building activities for this technology through writing the first draft for the Japanese Industrial Standards (JIS). When I was younger, I of course played a more central role in development as a technical expert, and this bore fruit as part of the first generation MPEG-1 technology. For the second generation, I brought the team together and ensured that development proceeded smoothly, while also carrying out negotiations with other companies at meetings to promote the adoption of our technology. After withdrawing from these standardization activities in the second generation, I was called upon by my company to return in the third generation, where my role once again centered around negotiations with other companies to push for the adoption of NEC technology.
At standardization meetings I explained the advantages of our technological proposals, with my main role being to influence their adoption. Being adopted as a standard benefits the company through income from exercising patents, as well as being able to develop and release products to market earlier based on familiarity with the technology. Another important role I played was to coordinate and compile opinions between Japanese manufacturers, and present these to the International Organization for Standardization in a way that would benefit Japan. In other words, I strived to have our technology and the technology of Japan recognized as an international standard, acting as a representative for both my company and my country.
We pulled out of MPEG standardization in the third generation, but our technology adopted for previous MPEG standards continues to be provided to consumers in software and chip form. The patents we applied for have also brought in important patent licensing income. The fact that our technology is licensed to over 800 companies around the world should give you an idea of the scale. 20 years have already passed since my patent application, so it is now free for anyone to use, I'm afraid.