A data scientist changing society with the power of prediction
NEC's world-leading heterogeneous mixture learning technology is a cutting-edge analysis engine that enables accurate prediction and estimation from big data. As a data scientist, Yosuke Motohashi meets with many customers and uses this unique analysis engine to support the creation of new value. Here he talks about his focus as a data scientist, and his role as a researcher bridging the gap between customers and the development team. He also covers several examples of actual analysis solutions.
Providing customers with the best value based on prediction data
--Tell us about your work providing NEC analysis solutions, Mr. Motohashi.
Motohashi: In general terms, I am what is called a data scientist. However, the title and job definition for a data scientist is still very ambiguous at this point in time*. In light of this, I'd like to explain my job in a bit more detail.
NEC boasts a range of world-class analysis technologies, including facial recognition and text mining. Among these is the analysis engine called heterogeneous mixture learning technology that NEC developed independently. This technology enables highly accurate prediction and estimation from vast quantities of varied data. The team I belong to provides customers with analysis solutions utilizing this engine. My specific tasks include asking customers about the issues they are facing, and offering consultation regarding which analysis approach would be most effective, and which data I'd recommend they use for a given analysis method. At the same time, I perform actual data analysis and verification, and manage the analysis team that produces the end results.
Two professionals play essential roles in providing customers with optimal solutions: data scientists specializing in advanced data analysis, and domain experts with seasoned knowledge of the customer's specific industry sector and the various issues they face. This is crucial because we cannot offer analysis solutions that provide true value for a customer's unique issues with surface-level consulting based on templates. Data scientists such as myself usually partner with domain expert staff who can take advantage of industry solutions that NEC has built up years of experience in to perform our duties, but in some cases I cover the role of domain expert myself.
- *Also known as a "Quantitative Analyst."
Consulting ability is more important than statistics
--How would you define the role of a data scientist?
Motohashi: With the current focus on utilizing big data, data scientists are in the spotlight now, but the actual substance of what they do remains nebulous. Personally, I would define data scientists as those who plan analysis solutions that contribute to customer value based on data. You often hear it said that data scientists are experts in statistics, but I believe that's a little off the mark. Mathematical skill in areas such as statistics is a qualification that goes without saying.
When it comes down to it, I'd say the skill a data scientist requires most is consulting ability.
This boils down to three points. The first is the ability to correctly identify what the customer's issue is and what they want to achieve. The second is the ability to design analysis methods. The third is the management skill to control the analysis team. Based on these three pillars, a data scientist must make the optimal proposal from the data and issues they are provided with.
First and foremost, those who like numbers are well suited to becoming data scientists. Note that I said numbers, and not mathematics. People coming from the field of biology in which plants are grown, as well as domains such as social science and experimental chemistry, are also apt for the job I think. These areas may seem unrelated at first glance, but they each involve estimating or understanding phenomena based on observation. I think these fields have many things in common with the basic stance of data scientists, who perform mathematical analysis based on the observation of phenomena.
--What is the most crucial aspect of your job as a data scientist?
Motohashi: Above all, it is important to properly discern the ultimate goal of analysis. In fact, the results obtained through analysis are often very far from the customer's end objective. The initial request presented by the customer is also not their ultimate goal in many cases.
For example, when forecasting retail sales, the sales forecast itself is not what the customer wants to know in the end. Each customer has a different end objective, whether it is reducing wastage based on the forecast, or coming up with a development plan for a new product. The analysis target and the data used also vary according to the objective. That means it is crucial to correctly identify the ultimate goal that will resolve the customer's problem, and properly determine what to forecast and which data to use. When we determine that the analysis plan created by a customer does not match their ultimate goal, we may propose changes.
In my opinion, one who merely implements the data and analysis issues given to them by the customer, and provides feedback on the results of validation, cannot be called a true data scientist. The mission of a data scientist is to properly identify the ultimate goal of the customer, propose an analysis solution matching that goal, and manage the process.
Impressing customers with the results of analysis is incredibly rewarding
--How do you proceed with implementing an analysis solution?
Motohashi: When implementing an analysis solution, we first identify the business challenges the customer is facing, and clarify the objectives and aims for analysis system utilization. This point differs significantly from general system development, and is a particularly important part of developing analysis systems. As I mentioned earlier, this is because the data and type of analysis to use may change unless we ascertain the customer's ultimate goal. Next we establish an evaluation index for the analysis data, and construct a data analysis scenario for achieving the goal, as well as a hypothesis regarding the analysis method. We hold group discussions between the customer, NEC sales staff, domain experts, and our team to clarify goals and plan hypotheses. Next, we take the customer's data and perform trials based on the hypothesis we have constructed. We examine these trial results as we proceed with actual system development, and repeatedly perform further analysis during this process, while adjusting areas such as the definition of requirements. My main job is the overall management of this series of processes.
Currently, the majority of the work I do involves prediction using the heterogeneous mixture learning technology analysis engine. However, the next step is to actively pursue initiatives for controlling production, procurement, and distribution, as well as social systems, based on these predictions. For example, in the distribution industry we provide specific solutions, such as ordering systems that maximize benefits with minimum cost, based on predictions.
I also have another important job. That is my role as a researcher. For one thing, this allows me to provide the cutting-edge analysis engine we have researched and developed as the optimal analysis solution to customers with a developer's perspective. Having someone actually involved with the research and development of the engine explain its mechanisms and effects provides customers with a lot of peace of mind. Another reason for this is to give accurate feedback to NEC's research and development team regarding issues and requests picked up through actual interaction with customers on the ground, in light of my knowledge as a researcher. Fostering close coordination and communication between the developers of heterogeneous mixture learning technology, software designers, and myself as researcher who actually deals with customers, will lead to further enhancement and advances in the heterogeneous mixture learning technology analysis engine, as well as related software.
--Can you discuss what motivates you as a data scientist, and talk about some of the challenges you have faced?
Motohashi:I enjoy it when I report prediction results verified based on the data we have been given, and the customer can't believe how accurate they are. That's always an incredible feeling.
For example, when I provide pinpoint predictions for property or product prices to customers such as real estate agents or secondhand goods dealers using the heterogeneous mixture learning technology analysis engine, they are often surprised at how the predictions are as accurate as those that would be made by an industry veteran. Personally, I really enjoy meeting many customers from a wide range of industries, hearing what they have to say, and looking over the actual data. While working at NEC, I almost feel like I become an employee for all sorts of companies, dealing with the manufacturing industry one day, then distribution the next. I also find it very fulfilling to be exposed to practices and knowledge I was previously unfamiliar with.
As for challenges, it does pose a problem when a customer simply provides me with data, and asks me to do something interesting with it without any particular goal in mind. Things like that can be a little difficult.