Breadcrumb navigation

From the Frontline of the Research on Machine Learning in the US

June 12, 2019

Vol.1

Hans Peter
Graf

-Groundbreaker in Machine Learning researches-

Vol.2

Leon
Bottou

-Originator of Stochastic Gradient Descent Method-

Vol.3

Ronan
Collobert

-Pioneer in Natural Language Processing-

Ronan Collobert
-Pioneer in Natural Language Processing-

When he was a researcher at NEC Laboratories America (NECLA), Dr. Ronan Collobert conducted research on applying end-to-end neural network approaches to natural language processing. That was a novel idea at the time, which now became the mainstream approach in this field. His paper co-authored with Dr. Jason Weston, a co-worker at NECLA, "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning," has proved that, to improve task prediction accuracy, multitask learning leads to excellent results in natural language processing. Multitask learning, in this case, can promotes acquisition of common factors among tasks by letting the system learn several related tasks simultaneously. However, because that was a radically new idea at that time, the paper did not get much attention. He thought it was important not to be discouraged by negative opinions, and to follow one's own direction. Here, he talks about the story behind his motivation to apply deep learning to natural language processing and the future prospects of AI research.

Dr. Ronan Collobert

One of the research leads of Facebook AI Research, the AI research division of Facebook. He was a researcher in the machine learning department of NEC Laboratories America (NECLA) from 2004 to 2010. The paper “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning” (co-authored with Dr. Jason Weston), published in 2008, won the Test of Time Award (in honor of the most influential paper in the past 10 years) at ICML, a world-class machine learning conference, in 2018.
Including this achievement, Dr. Ronan is one of the world's leading authority on machine learning, He is currently involved in research on natural language processing, computer vision, speech processing at Facebook in Silicon Valley.

His long-term dream is to build the technology which would enable us to “talk to a computer” and, in the NECLA era, he played a central role in the development of Torch, a very popular machine learning platform. Also, he succeeded in achieving a number of pioneering research in the AI field where very little research had been done on neural networks in natural language processing. Since then, natural language processing with neural networks has become a global trend. After NECLA, Dr. Ronan has held a four-year role as Head of the Applied Machine Learning Group at the Idiap Research Institute in Switzerland, and then joined Facebook in 2015.

He received his Ph.D. at the Pierre-Marie Curie University, while achieving it under the supervision of the Bengio brothers (currently active, and leading AI researchers). He is also author or coauthor of over 100 peer-reviewed papers.

The doubt for the existing approaches has paved the way for new natural language processing

I was originally conducting research on fundamental machine learning, and was looking for practical applications of AI with Jason Weston, a colleague of NECLA. Jason and I were both interested in improving techniques to interact with computers. Together, we decided to investigate new machine learning techniques in the field of Natural Language Processing.
At that time, we knew almost nothing about natural language processing. Nevertheless, there was a sense of discomfort in the way research scientists prepared data and implemented various custom featurization in natural language processing. Instead of designing custom hand-crafted features, we thought it would make more sense to let machines learn through some kind of training, just like humans acquire language skill.
That was the starting point, but at this time it was unclear how to realize the idea, and it took a long time to get results. Humans can learn a language by listening to words repeatedly. This is, in a sense, a kind of unsupervised learning. However, it is hard to do that with our computers.
While thinking about this problem, we realized that we could build word representations by learning a language model which would read a lot of sentences. So we conducted research on how to realize it on a machine and the result is that paper.
It took us more than six months to make an experiment succeed. In fact, the first successful experiment ran 2 months on a machine, and was at first forgotten, as an experiment in machine learning is often given up if the expected result can not be obtained in 2 weeks. We almost gave up, but when we discovered this working experiment, we continued investigations around this language model idea. And we were finally able to get surprising results, which led to this paper. Of course, with the current machine power, you should get the same results much faster.
In this way, successful AI research requires many trials and also sometimes some luck.

Searching for efficient speech processing in a simpler way

At Facebook, in addition to natural language processing, I have been also involved in research on computer vision and speech processing. Management is also part of my job, mentoring researchers in those three areas. Personally, I'm now focusing mainly on speech processing, and I'm looking for ways to make it as end-to-end as possible, as well as ways to leverage unlabeled data.
In speech recognition, language models are important to disambiguate words, by inferring the probability of occurrence of a word from the previous spoken words. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from character-level transcriptions, while implicitly learning a language model on these transcriptions.
These approaches still require an extra decoding step at inference, with an explicit language model, to be state-of-the-art. Taking a different route, we proposed in a recent work a differentiable speech recognition decoder, which allows us to train jointly an acoustic model with an explicit language model. This approach leads to more efficient acoustic models, and faster decoding inference.
Concerning deep learning, current algorithms have their limitations - it is hard to know what will be next in machine learning, but I think it is important to investigate "unsupervised learning" techniques. "Supervised learning" requires labeled data for training models, which can be difficult to get. For example, in speech recognition transcribing a large audio corpus is a very expensive task.
In contrast, "unsupervised learning" techniques train models on unlabeled data, which is very abundant and cheap for many problems. If one manages to transfer the knowledge from a unsupervised model into a supervised learning task, then one might be able to solve problems or realize certain functionality with much less labeled data. This is what we showed in 2008 with Jason for a number of Natural Language Processing tasks, or what we are now investigating in speech recognition.

NECLA was the place to think deeply and exchange ideas

Research is a very social job, and it is important to discuss ideas with other researchers. NECLA has a wonderful garden and a small pond, and I often got ideas while walking around with Jason. I think it was very relaxing and, in a sense, a state of Zen.
NECLA is located in Princeton, far from NEC's headquarters, so it was not clear what was expected from us.
Hans Peter, Head of Machine Learning Department, served as an interface between NEC and NECLA. It can be said that he protected us. While promoting applied solutions in the medical and automotive fields, we were also able to conduct our own research freely. I am very grateful to his management.
In this environment, my colleague Jason and I decided to work together on Natural Language Processing. Given Jason was programming in MATLAB, and I was programming in C++, we decided to build our own tool (Torch7), which was scalable for Natural Language Processing, and flexible enough for our research.
As we open-sourced it, Torch7 has been used for a long time in the AI community. Similarly, SENNA, a pure C library for natural language processing, was also successfully open sourced. The initial creation of these tools made it easy to try new things and advance various developments, which led to impactful achievements in the field of natural language processing.
I think that overall we had a lot of freedom at NECLA in terms of research. For example, there was the so-called seeds projects in which everyone had the opportunity to propose risky and novel ideas to the lab. If it looked promising, the project could get a little budget to pursue the idea. One time, with Jason, we thought about a proposal idea, but we didn't have enough time to write a formal proposal paper. So we turned the idea into a skit and took a video at Jason's home in New York. One of us wore a wig, pretending to be a character whose initials are N.E.C., and described our idea in a funny way. And when the video was projected without us at a presentation meeting, the room was filled with laughter, we heard later. That is one of my good memories at NECLA.

Demand for researchers are Increased as the practicality of AI is realized

Recently, many companies have come to understand that AI can be applied to practical applications, in particular to organize large amount of data. As a result, the demand for AI-related personnel has dramatically increased, and AI researchers have also attracted attention.
I work for Facebook now, but I left NECLA for reasons other than meeting that demand.
NECLA was very flexible, and there were only 15 members in the machine learning department, so I felt no overhead like in a large organization. That was a wonderful research environment for sure. However, some of the fellow researchers (like Jason on Leon) left NECLA for their personal reasons. As my best co-workers left, and as I also wanted to return to Europe, I finally signed up for an academic research job in Switzerland.
And one day, Facebook opened their own AI research department, and I got a call. At first I did not want to move, but I found out that several of my friends, including Jason and Leon, were hired by Facebook, so that's why I came here.

Be stubborn and don't give up until you accomplish

I have three advices to young AI researchers. The first one is that you should be stubborn. I myself focus on what I should do every day, and I continue trying without giving up, despite of mistakes and failures.
My second advice is that you shall not be too much influenced by trends in community. If you follow that trends, it is hard to come up with actual impactful new ideas. In other words, it may not matter what methods other researchers are using to solve a given problem: for long-term impact, it may be more important for you to choose a method you deeply believe in, and stick to it no matter what.
And my third advice is to be careful when depending on existing machine learning frameworks. Using existing frameworks is convenient, but, at the same time, it may be hard to escape their limitations. These frameworks provide a number of building blocks for fast prototyping, but chances are that impactful research will require to implement quite different new blocks, or even build blocks which do not fit in the existing frameworks. So you must not be tied to one tool.
Finally, of course, maths are the foundations of machine learning, so a strong mathematical background is highly recommend! Please pursue your own ideas on that foundation.

(interview and text / Kazutoshi Otani)