Displaying present location in the site.

Detection, Auto Analysis of Cyber Threats Using Open Source Intelligence

Threat information such as cyberattack techniques and responsibility claims is distributed via social media and the deep web. However, the explosion of information and an insufficiency of security analysts make it difficult to detect such information at an early stage. This causes the problem of delayed preparation against attack damage. This paper introduces an automated proactive attack prevention technology that employs a technical analysis technique used in financial engineering to identify signs that a threat trend is reaching a peak. At the same time deep learning is employed to analyse the overall evidence of a cyberattack.


Cyberattacks have recently become a social problem by causing damage in many countries worldwide. The security analyst protecting each organization should always collect and analyse the huge amount of threat information showing signs of cyberattacks and take prompt action when the occurrence of an event is predicted. This will ensure the smooth operation of the targets that are frequently attacked; such as critical infrastructures, government institutions and private enterprise entities.

If a new vulnerability discovered in a piece of software or hardware used in an organization is left without taking appropriate action, there is a risk that a cyberattack will occur that exploits the vulnerability. Damage such as confidential information theft may then ensue and malware infection may be caused not only within the organization but also among customers and even in organizations unrelated to the original target. For example, the ransomware “WannaCry” that produced worldwide damage in mid-May 2017 was spread by the “EternalBlue” attack tool that targets OS vulnerabilities.

An attacker who brings a cyber threat executes cyberattacks in an organized manner by collecting information on the attack tools and unreleased vulnerability from social media and black markets. Services that will undertake cyberattacks are also now being developed. Some of the typical example of these are Booters and Stressers that are also known as DDoS-for-hire services and the RaaS (Ransomware as a Service) that distribute ransomware widely in order to take a victim’s files hostage. If the victim pays the ransom, RaaS pays part of it to the client that ordered the attack. The characteristics of these services are that the series of attack actions are automated and the attacks are low cost procedures.

On the other hand, on the protection side, analyses of the cyber threats caused by 100% human labor have already reached a limit for the following reasons.

  • The advent of Industry 4.0 has expanded the target of protection from IT equipment to OT equipment.
  • The numbers of arrests and consultations related to cybercrimes are increasing every year.
  • The amount of information distributed through social media as means of threat information circulation has increased by about 9 times in nine years from 2005 to 2014.
  • While the insufficiency of security engineers has already become a problem, these human resources cannot be cultivated in a short space of time due to the necessity of a wide range of knowledge on the system construction.

The background described above is increasing the social need for an efficient means of cyber threat analyses.

In the rest of this paper, in section 2 we describe a cyberthreat information analysis technique based on OSINT (Open Source Intelligence) proposed by NEC and in section 3 we report on an evaluation test, before providing an overall conclusion in the final section.

2.Cyber Threat Information Analysis Based on OSINT (Open Source Intelligence)

At NEC, we have attempted to automate the cyber threat analysis in five phases shown in Fig. 1. The key technology element used in each phase will be described in the following subsections.

Fig. 1 Flow of automated cyber threat analysis.

2.1 Data Collection

For the purpose of research, threat information is collected permanently and saved from more than three million social media, blogs and underground sites on the Internet. The collection targets are expanded autonomously by tracing malicious sites moving across data centres worldwide by detecting the attacker communities. The targets of collection also include the websites located on the deep web that are usually unsearchable by ordinary search engines.

2.2 Prediction

After the presence of malware that expands infections via routers under certain conditions was confirmed in mid-September 2015, cyberattacks targeting such routers occurred frequently from mid to late September of the same year.

Fig. 2 shows the daily changes in the numbers of attacks per IP address and tweets posted in social medium “Twitter.” The multiple regression analysis over the entire period of attacks showed little correlation between them. Nevertheless, high correlation and a certain interlocked property was observed with the analysis focused only on the first peak in the numbers of attacks and tweets around noon of September 15th. This means that identifying signs of a sudden increase in the number of tweets as early as possible would make it possible to detect signs of an attack occurrence.

Fig. 2 Numbers of attacks of routers and tweets in social media.

Traders engaged in investment operations of the financial industry gain profits by predicting issues that will arise and affect stocks and by buying stocks at low prices and selling at high prices.

On the other hand, security analysts also wish to predict vulnerability attacks that become prevalent or the arrival of malware from the trends of threat information and to prepare for cyberattacks in advance so as to minimize the period of potential damage.

This shows that traders and analysts share the same purpose of predicting future trends, except that the objectives are different.

One of the means of predicting the stock price movements used in the financial industry is the technical analysis that predicts the future price movements from changes in prices from the past to the present. Considering that the purpose has similarities, we assume that the technical analysis technique will also be applicable to the prediction of cyber threats.

Technical analysis can be divided roughly into the trend and the oscillator analyses. The known trend indicators include the EMA (Exponential Moving Average), which is suitable for identifying the mid- and long-term trends. While the known oscillator indicators include the Historical Volatility and RSI (Relative Strength Index) suitable for identifying short-term trends. There is also the MACD (Moving Average Convergence Divergence), which is an intermediate indicator with the characteristics of both of the above. This identifies the market cycle and sale/purchase timings based on the short- and long-term movement averages of lines. We have clarified that the MACD technique is particularly effective for the analysis of cyber threat trends.

As seen in Fig. 3, we analysed each of the characteristic terms contained in the threat information using an original algorithm based on the MACD technique and calculated the degree of causing serious consequences. The results are output as the importance ranking in Fig. 4 for providing security analysts with the opportunity of noticing signs of threats as well as for use as information for the overall image analysis in the next step.

Fig. 3 Example of threat information trend analysis using original MACD-based algorithm.
Fig. 4 Display of threat prediction result rankings.

2.3 Automated Analytics

We have developed a technology that uses the deep learning method to promote expertise in overall cyber threat image analysis by skilled security analysts of existing analysis results and of terminal operations histories. If a new clue is found, the entire view of the threat is thereby exposed based on the acquired knowledge (Fig. 5).

Fig. 5 Overall cyber threat image analysis using the deep learning technique.

Since a threat does not necessarily imply a cause of immediate damage, the judgment whether or not a threat will lead to system damage varies depending on the organization’s system configuration and workflow as well as the analyser’s interpretation and the reliability of the information sources. In addition, the volume of the threat information is very large and simple association of information could result in the enumeration of thousands of elements that include malicious IP addresses. Direct application of the obtained results as security countermeasures may also affect routine operations due to excessive protection.

Overall threat image analysis of an appropriate amount and based on a standard specific to each organization is possible by learning the procedures used from previous analyses of cyber damage events. If a new threat is detected it is automatically analysed based on the learned results.

2.4 Hunting & Prevention

The importance of “threat hunting” is increasing as a methodology for coping with targeted attacks that threaten specific organizations via E-mails, etc.

For example, endpoints detect the presence of malware using pattern files provided by anti-virus software. However, there are cases in which damage is caused by malware before completion of the delivery of the pattern file and damage may be detected when using it. Specific organizations are particularly prone to damage by targeted attacks. Such attacks often use malware with malicious devices which can avoid detection by the anti-virus software used by the targeted organizations.

Threat hunting inspects an entire system by using the signs obtained by threat information analysis as a hypothesis to verify the presence of damage and the degree of risk. Its automation enables proactive security measures such as identification of damage and future attack potential immediately after a cyberattack group claims responsibility for an attack (i.e. before the pattern file is delivered).

2.5 Intelligence Sharing

The results of threat analyses are saved with STIX*1, which is an open structured threat information expression description language elaborated by the OASIS standardization organization. It is from this point that the information required for firewalls and IDS setting changes is generated in order to enforce security measures such as the blocking of the cyberattack transmission source.

The threat analysis results can be shared with other departments, organizations and countries using TAXII*2, which is an automated detection indicator information exchange procedure elaborated also by OASIS. Cooperative relationships are built so as to service the knowledge obtained from external organizations in the security measures of specific organizations.

NEC has joined the AIS, an initiative promoted by the U.S. Department of Homeland Security for sharing cyber threat information among governmental and private sectors, so that we will bolster cyber intelligence and technologies and also human resources in its cyber security businesses.

3.Evaluation Test

The authors measured the number of mentions of cyber threats from the tweets posted in social media every other hour in the period from July to December 2015. From the changes in the number of tweets, the date/time at which sudden rises were detected were obtained by using an original MACD-based algorithm. In addition, the authors also surveyed the date/time of publication of the earliest article by a public institution mass media source or vendor of each cyber threat. As a result, it was confirmed that the original algorithm is capable of detecting cyber threats 56.1% earlier on average (Table).

Table Early report rate of cyber threats.


In the above, the authors first describe the expansion of cyberattacks and the circulation of threat information via social media and the deep web which may cause the cyberattacks. It was noticed that the first peak in the number of tweets related to cyber incidents on the social media is linked to the number of associated attacks. The authors then proposed a method of extracting threat information with a high potential of damage by using a technical analysis technique as used in financial engineering. In addition, the authors also conducted an evaluation test and demonstrated that the proposed technique can detect the threat information 56.1% earlier on average than via announcements by public institutions, etc.

The introduction of the proposed procedure can detect the signs of attacks from a huge amount of threat information and apply early and accurate measures so that the period of a potential attack and damage may be decreased.

In the future, too, the authors intend to promote the threat hunting procedure, which takes preventive action before incurring damage from cyberattacks, and they will continue their research activities aiming at the implementation of safe, secure and efficient social infrastructures.

  • *
    Twitter is a registered trademark or trademark of Twitter, Inc.
  • *
    All other company names and product names that appear in this paper are trademarks or registered trademarks of their respective companies.

Authors’ Profiles

Senior Researcher
Security Research Laboratories

SHIMA Shigeyoshi
Principal Researcher
Security Research Laboratories