Automatically extracting tacit know-how to achieve the world's highest task success rate: Featured Technologies

January 14, 2026

Amid the rapid advancement of generative AI, expectations are rising for AI agents as the next step in technology. Working closely with human operators with their routine tasks, AI agents autonomously break down tasks to design necessary operational processes, and automatically execute operations by selecting AI models and IT services that are most suited for handling the individual tasks. Many leading companies worldwide are developing this technology, but NEC's agent technology for automating business tasks has now achieved the world's highest task success rate in an international benchmarking with web browser tasks, surpassing human success rates for the first time. What can this technology do and how was it developed? We spoke with the researchers about the details.

Accurate performance even with vague instructions like “do that”

Masafumi Oyamada
Executive Professional and Research Fellow
Data Science Laboratories
Research leader for this project

― Could you explain what kind of technology NEC’s task automation agent is?

Oyamada: NEC’s task automation agent technology performs various operations on behalf of people, such as e-mail transmissions and search on Web browsers, purchase of supplies, and making internal applications. We developed this technology with the aim to create something that works like a colleague who always supports you with your job. The accuracy of its task performance is among the highest in the world. In the international benchmark WebArena (Note 1), it recorded a success rate of 80.4%, marking the first time in the world that an AI has surpassed the human success rate of 78.2% (Note 2).

This high accuracy is achieved through learning from operation logs on Web browsers. While analyzing the operation logs of many users, the AI agent identifies and learns which person is working on which task on which website for what purpose in real-time, capturing the know-how of individual users. This enables the agent to operate based on even vague instructions without the need for complex prompts. In an extreme case, you can just say, “do that.” Of course, the accuracy may slightly decrease, but it will still figure out the context by referring to the most recent operation history to infer relevant tasks. For example, if the user was working on purchasing supplies that were requested via e-mail just before the AI agent took over, the AI agent will search for suitable items and make the purchase.

Another major advantage is the ability to extract tacit know-how from experienced employees. It saves you from interviewing them and making how-to manuals―instead, you can accumulate extracted know-how as the organization’s assets to be shared with younger employees.

While there have been technologies that save and resuse the agent’s own logs, none until now have analyzed and leveraged human operation logs in this way. I came up with this idea at the beginning of 2025 and filed patent applications over the period from February to March. During the six months that followed, I worked on its implementation.

Masahiro Tani
Director
Data Science Laboratories
Management of the entire project

Tani: Competitors around the world are breaking top-performance records one after another, so we aimed for the earliest possible technology development and public release. Initially, we were planning to publicly announce this development at the end of the year, but we ended up moving it forward to August. We had originally prepared a press release with a placeholder for the WebArena success rate, such as “Achieved XX%.” We were astonished, however, when the final results exceeded even our own expectations, marking the first time an AI has surpassed the human success rate. This feat owes much to this team.

Oyamada: This area has established competitors, so we thought it would be difficult to be the top runner in this category. So, at first we tried to make an Agentic AI that specializes in a specific operation instead of aiming to cover general purposes. However, in the end, we managed to achieve the world’s highest accuracy, outperforming competitors even for general-purpose applications. Our teamwork was truly the winning factor. In addition to the members here, Takuya Tamura, Junta Makio, and Taro Yano, who couldn’t participate today, all contributed to this success.

Note 1:
WebArena (https://webarena.dev/)
Note 2:
Based on scores publicized by institutes and enterprises regarding the task success rate in WebArena as of August 2025 or NEC’s re-aggregated data

Task success rate of 30% improves to over-80% through steady verification

Masafumi Enomoto
Researcher
Data Science Laboratories
Core engine development and formulation of overall policy and measures

― How did you develop the core engine?

Enomoto: There are two main elements. One is that the AI agent needs to learn the user’s operation history as mentioned earlier. We worked on this new development as NEC’s original approach. The other element is to meticulously examine other Agentic AI models for benchmarking and create an AI agent that adequately meets the intended purpose. We worked on developing the core engine by combining these two.

Kosuke Akimoto
Principal Researcher
Data Science Laboratories
Core engine development

Akimoto: While we can’t disclose most information regarding the AI learning of action history, I can safely say that it was a big challenge to determine how to utilize the log data. The volume of raw log data is immense, including many parts that are unrelated such as entire HTML page information. We needed to work around the issue of how to take out know-how from among such mixture of information. We tried and compared various approaches to find the best solution.

On that note, the know-how captured through this technology can be verbalized. Users can also edit the verbalized knowledge, which can be useful for creating a manual.

Enomoto: We needed to try different methods for benchmarking. There were times when it didn’t go well even when we tried with methods that are typically considered decent in our community or in research papers. Sometimes taking the opposite approach worked, so it was all trial-and-error at first. We also repeatedly tested and adjusted the engine that we created.

Obara: In the early stage of development, the task success rate in the benchmark test was hardly more than 30%. In order to bring it closer to the human success rate of 78.2%, we examined what differences there were between the two and made improvements. For example, we humans see a Webpage as one screen, but an Agentic AI only sees its HTML structure. We made adjustments to fill the gap as we discovered such differences one by one—this happened to be an interesting process that enlightened us to new information and perspectives.

Zhang: On another note, the evaluating side can make mistakes. Because Agentic AI models can act freely, it can reach answers that traditional rule-based evaluation methods may overlook. If we continued to make improvements without noticing these evaluation errors, we would end up going in the wrong direction with our development. Therefore, we carefully corrected each deviation to improve accuracy.

Takeoka: Exactly. When we just started, we had no idea what was happening. We just got the results―whether it worked or didn’t work. Before making any correction, we had to deep-dive into whether a deviation is due to the AI’s behavior or an evaluation error.

Ready to use upon installation of Web app

Kunihiro Takeoka
Principal Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

― Are there any special procedures needed to install NEC’s task automation agent?

Takeoka: No, there aren’t. You can start using it just by installing the app on your web browser. However, by importing manuals or other data you have, you can achieve an even smoother rollout. For example, if you upload a manual for new employees to NEC’s task automation agent, it can immediately handle all the tasks described in the manual. With continued use, the agent will also learn tasks and know-how not written in the manual, becoming smarter over time.

Enomoto: Data does not have to be limited to document format to import / upload. The agent can also learn from images, audio files, diagrams, tables, graphs, and other types of data, allowing it to acquire a broad range of knowledge.

Oyamada: Last year, NEC released a technology that understands the context of tables and figures, which enables use of a wide range of data formats.

― What is the timeline for the service launch?

Tani: We have been using NEC’s task automation agent within our team and select departments throughout the company, and based on the verification results, we were able to organize services. Starting January 2026, we plan to launch NEC’s task automation agent as a comprehensive solution that combines software, consulting, and operation & maintenance services.

A partner that enhances the user’s operational performance

Haochen Zhang
Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

― What other possibilities are there for NEC’s task automation agent going forward?

Zhang: Administrative procedures like applying for business trips are processed using a complex system at most companies nowadays. This can be confusing for new employees in particular. In response, NEC’s task automation agent can extract know-how from the users’ operation histories and create a support system for new employees. Instead of having to look up how to submit applications themselves, new employees can simply let the agent handle these procedures and only need to do the final check.

Takeoka: Exactly. Our first step is to achieve a working style where people delegate routine or peripheral tasks to NEC’s agent so they can focus on their primary responsibilities, simply checking results afterwards. The next step is to make it work side-by-side with the user on the same task, checking each other’s progress and results as they go, as well as discuss matters like a partner.

Oyamada: It’s about using an AI agent for improving work performance. Automation is already in our sight for any task to a certain extent. Going forward, we hope to steer development toward supporting the user’s productivity to maximize it. For example, someone working alone might operate at a performance level of 90, but in collaboration through teamwork or discussion, that might rise to 150 or 200. If NEC’s agent can talk to and motivate you while watching over your condition in real-time, or be a discussion partner as necessary, it can proactively offer new perspectives that an individual might miss. Creating such an environment is something we are keen to pursue.

Akimoto: With regard to performance, I believe the agent can eventually be used to preserve the user’s best conditions. You may have experienced a problem when doing something that you haven’t done for over a year. Taking my case as an example, in the course of previous model development, I had trained myself to be able to handle a supercomputer to the point that my hands naturally and smoothly move. However, trying the same thing a year later, I realized that I had gotten rusty. We humans can rarely maintain 100% proficiency for every task. If NEC’s agent can codify and store those learned skills, taking over when needed, people can focus on learning and mastering new tasks. If this becomes a reality, we can do many more things at the same time. I hope to make the agent smarter in this direction.

Ryoma Obara
Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

Obara: Yes. Ultimately, I believe that in remote environments, it may become impossible to distinguish whether you’re interacting with a human or an AI. That is the level of agent capability that NEC’s task automation agent has the potential to achieve. At the same time, as we move toward practical deployment, we recognize the need to strengthen our efforts in security measures. We aim to develop a solid system that protects valuable know-how and other information while managing the scope of sharing.

Enomoto: In terms of what’s possible, one of our visions is to further advance the agent’s ability to capture know-how. Currently NEC’s task automation agent learns from browser operation logs, but if there is a period of inactivity, it cannot gather any new information during that time. For example, if you are watching a video and you try to pause at a specific scene, the agent can infer what type of content you are looking for, but when there are no actions being taken, it can’t understand what you are thinking internally. By integrating technologies such as eye-tracking or leveraging biometric data, I believe we can enable the agent to better connect with users’ thoughts and intentions.

Tani: With regard to how we will be deploying this technology as service, we are currently discussing possibilities with different business units to identify the practical use cases. As mentioned earlier, we are also rolling out the technology internally alongside research and development. While internal deployment is meaningful in terms of streamlining operations and experimenting, I would also like to note the advantage that the individual employees can also experience and understand its benefits first-hand as users. It makes them able to explain the benefits with enthusiasm as actual users, which will significantly contribute to future external deployment.

Back row (from left): Zhang, Oyamada, Akimoto, Takeoka, Tani
Front row (from left): Tamura, Makio, Enomoto, Obara

Masafumi Oyamada
Executive Professional and Research Fellow
Data Science Laboratories
Research leader for this project

Masahiro Tani
Director
Data Science Laboratories
Management of the entire project

Masafumi Enomoto
Researcher
Data Science Laboratories
Core engine development and formulation of overall policy and measures

Ryoma Obara
Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

Haochen Zhang
Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

Kunihiro Takeoka
Principal Researcher
Data Science Laboratories
Analyses of experiment results and formulation of measures

Kosuke Akimoto
Principal Researcher
Data Science Laboratories
Core engine development

NEC’s task automation agent is designed to accurately handle web browser tasks such as sending e-mails, conducting searches, making purchases, and submitting internal applications. In the international benchmark WebArena, it recorded a success rate of 80.4%, marking the first time in the world that an AI has surpassed the human success rate of 78.2%. At the core of this technology is NEC’s proprietary innovation: learning from users’ web browser operation logs. By studying a vast number of user logs, the agent can extract know-how from the enormous volume of data, enabling automated processes and codifying tacit knowledge. AI learning from users’ operation logs is a new approach that has never existed before. This not only enables the agent to acquire the expertise of experienced employees but also interpret vague prompts, inferring context to achieve tasks with high accuracy.

※
The information posted on this page is the information at the time of publication.

Go back to Featured Technologies

Breadcrumb navigation

Automatically extracting tacit know-how to achieve the world's highest task success rate
NEC’s agent technology for business process automation

Accurate performance even with vague instructions like “do that”

Task success rate of 30% improves to over-80% through steady verification

Ready to use upon installation of Web app

A partner that enhances the user’s operational performance

Breadcrumb navigation

Automatically extracting tacit know-how to achieve the world's highest task success rateNEC’s agent technology for business process automation

Accurate performance even with vague instructions like “do that”

Task success rate of 30% improves to over-80% through steady verification

Ready to use upon installation of Web app

A partner that enhances the user’s operational performance

Automatically extracting tacit know-how to achieve the world's highest task success rate
NEC’s agent technology for business process automation