No.2 (September, 2012) Special Issue on Big Data
Vol.7 No.2 (September, 2012)
Special Issue on Big Data
Big data processing platforms
KOCHU Daisuke ・MIKUNI Yukinori
With the aim of achieving timely extraction of valuable information, the need for ad hoc analysis of big data is increasing rapidly. This paper describes the reason that the InfoFrame DWH (Data WareHouse) Appliance is attracting attention as a high-speed analysis platform and introduces actual case studies of its use in marketing by ad hoc analyses of website access logs.
IIJIMA Akio ・KUDO Masashi
Big data multiplies and changes every day, and the quantity of data and the required computing/processing capability vary between projects. To process big data efficiently, it is necessary to locate the required ICT resources dynamically and scalably in optimum placement. This paper describes the features and effectiveness of ProgrammableFlow that optimizes computer and network resources dynamically using OpenFlow/Software Defined Network (SDN) technology.
OOSAWA Hideki ・MIYATA Tsuyoshi
The era of big data is characterized by an increasing need to create new value and new business through the real-time processing of large amounts of data. This processing requires an increase in individual data processing speeds as well as high throughput. This paper introduces the InfoFrame Table Access Method, a memory DB product suitable for real-time big data processing thanks to its high-speed parallel data processing capability.
KAWABATA Terumasa ・HAMADA Mitsuyasu ・TAMURA Minoru ・HAKUBA Tomohiro
The age of the information explosion has recently been raising various opportunities for big data analysis. On the other hand, the progress in hardware developments has seen a significant increase in the capacities of mountable memories. The InfoFrame DataBooster has been meeting the needs for high-speed processing of big data by using an in-memory data processing technology based on column store. This paper introduces differences between the regular RDB and the InfoFrame DataBooster. It goes on to discuss the background and method of development of the SQL interface of the latest version, the method of use of the InfoFrame Databooster and the application domains (case studies).
SUKENARI Teruki ・TAMURA Minoru
Existing relational databases experience problems when used with big data because they cannot deal flexibly with increases in the amount of data and number of accesses (scaling out). On the other hand, key-value store is an advanced technology but it is also troubled by the problem of lacking data access with SQL and the transaction processing required for mission-critical operations. The InfoFrame Relational Store (IERS) is a scale-out-capable database software optimal for big data utilization equipped with 1) SQL interfacing, 2) transaction processing and 3) high reliability that makes it applicable to mission-critical operations. This paper introduces the features of the IERS and its architecture.
Locating the target data as close as possible to the CPU is of importance in the high-speed processing of a large amount of big data. The Express5800/Scalable HA Server series are servers optimized for use as big data processing platforms capable of using a large-capacity memory of up to 2 TB. Building a high cost-efficiency system using a high-speed PCI Express SSD is attracting recent market attention. The Express5800/Scalable HA Server series products are suitable for such a system because they support multiple I/O slots. This paper introduces their excellent advantages and discusses the actual examples in which their features may be usefully applied.
TAKAHASHI Chieko ・SERA Naohiko ・TSUKUMOTO Kenji ・OSAKI Hirotatsu
Development of technologies for the processing of “big data” has recently been advanced by network-related enterprises. Apache Hadoop is attracting attention as an OSS that implements storage and distributed processing of petabyte-class big data by means of scaling out based on the above technologies. NEC has conducted the test for the ability of Apache Hadoop for enterprise use and has built systems according to its characteristics. For the sizing of Apache Hadoop that is usually regarded to be difficult, NEC has developed a technology for size prediction by means of simulation. This paper introduces these technologies.
Big data processing infrastructure
KAWANABE Masazumi ・YOSHIMURA Shigeru ・UTAKA Junya ・
YOSHIOKA Hiroshi ・MIZUMACHI Hiroaki ・KATO Mitsugu
With the rapid increase in corporate data that must be stored long-term, such as the backup of management information and the archiving of e-mails with customers, the need for safe, easy storage of big data has been rising higher than ever. HYDRAstor is a grid storage system that meets these needs. Adopting a revolutionary grid architecture to achieve high performance, scalability and reliability as well as operation/management labor saving, it is suitable for the storage of big data. This paper describes the outline and features of the technologies used in HYDRAstor and introduces actual cases in which it is used.
Data analysis platforms
MUROI Yasuyuki ・MUKAI Yoshikazu
The information explosion is becoming a real issue, and the amount of information stored on file servers is continuously bloating, making the identification, organization and utilization of information on file servers difficult jobs. The latest version V2.1 of the Information Assessment Tool, a tool for “visualization,” “slimming,” “activation” and “optimization” of file servers, adopts the InfoFrame DataBooster high-speed data processing engine to deal with large-scale file servers and enable interactive analysis based on high-speed search/aggregation.
SU Leiming ・SAKAMOTO Shizuo
The Indian Unique ID (UID) is an extremely-large-scale system that attempts to identify India’s 1.2 billion people, which is about 1/6th of the world population, using biometric authentication. As there are also other countries studying the introduction of national-scale authentication systems, NEC is currently conducting related R&D for the implementation of such systems. This paper describes a suitable system for use in processing of extremely large-scale biometric information.
KATO Kiyoshi ・YABUKI Kentaro
NEC’s MasterScope middleware is an integrated operation management software suite. MasterScope collects operational and performance metrics from target IT systems and analyzes them comprehensively in order to detect and locate system failures. Detected events and failures will be notified to the operator and workarounds can be applied to recover from failures. There are similarities between such an analysis process for the operation management and that required to process big data. For Instance, the system performance analysis software “MasterScope Invariant Analyzer” automatically discovers important correlations from a large amount of performance data and proactively detects hidden performance anomalies, thereby avoiding serious system level damages. This paper describes the analysis technology of MasterScope which has similarity to the big data analysis technology, and then introduces experimental applications of the system invariant analysis technology in domains other than the operation management.
Information collection platforms
M2M technology enables us to control “things” and to collect various kinds of information from “things.” NEC provides the M2M solution CONNEXIVE with the aim of building the next-generation “Ambient Information Society.” This will help realize a safe and secure lifestyle and a revitalized industrial base. While making the best use of big data processing technology in analyzing and studying information acquired from CONNEXIVE we aim to realize a rich and innovative Smart Society. This will include the imminent Smart City and Smart Communities that will be centered on the city-based social infrastructures of the future.
SASAKI Yasuhiro ・TAKAHASHI Masatake ・AIMOTO Takashi ・GENSHIN Akira
The NEC Group has developed a piezoelectric vibration sensor that features sensitivity at about 20 times that of previous models. A vibration sensor is a device that corresponds to the auditory and tactile organs of the human body. The real world is flooded with vibration information generated by humans, goods and environments. Our recently developed vibration sensor can collect minute waveform data that has been hitherto undetectable and has therefore not been utilized. The device extracts the frequency components that present anomalies and analyzes their significance by means of cloud computing, so as to implement a safe and secure society by connecting accurate identification of situations and circumstances for the prevention of adverse events. This paper introduces features of the newly developed vibration sensor and discusses efforts being made for the development of its applications.
Advanced technologies to support big data processing
Recent years have witnessed a significant increase in the collection and analysis of a large amount of data such as website logs, sensor information from various devices, etc. Relational databases used to be a major system for storing such data, but recently KVS (Key-Value Store) is becoming more popular as a data storage method due to its superior characteristics in scaling out according to increases in data volume. However, KVS only support simple search functions. This paper introduces “MD-HBase,” which is an extended version of HBase, a KVS-type database. MD-HBase enables efficient data search performance for multi-dimensional range queries without giving up scalability characteristics.
SENDA Shuji ・SHIBATA Takashi ・IKETANI Akihiko
“Super Resolution” (SR) is a technique to restore a low-resolution image into a clear, high resolution image. It is necessary to infer missing high frequency components in order to restore a low-resolution image to a high-resolution image correctly. This paper describes the learning-based SR technique that utilizes an example-based algorithm. This technique divides a large volume of training images into small rectangular pieces called “patches” and brings them together in a dictionary as patch pairs of low-resolution and high-resolution images. Experiments show impressive results that identify specific objects such as text characters and human faces etc.
TSUCHIDA Masaaki ・ISHIKAWA Kai ・KUSUI Dai ・KUSUMURA Yukitaka ・NAKAO Toshiyasu
Since a huge amount of the texts included in big data consists of data created by humans for communicating information or expressing intentions to other humans, it is an important information source containing valuable information. NEC is tackling the development of technology for extracting “customers’ voices” and “rumors” from large amounts of text data and for utilizing them in marketing, corporate risk management and customer management. This paper introduces some of the recent research results of NEC. Included are: the recognizing textual entailment technology for recognizing included relationships of semantic content between texts, the technology for rumor detection from cyber information and the semantic search technology for improving the operation efficiency of contact centers.
FUJIMAKI Ryohei ・MORINAGA Satoshi
Recently, the acquisition of knowledge from big data analysis is becoming an essential feature of business efficiency. However, the analysis of big data can be troublesome because it often involves the collection and storage of mixed data based on different patterns or rules (heterogeneous mixture data). This has made the heterogeneous mixture property of data a very important issue. This paper introduces “heterogeneous mixture learning,” which is the most advanced heterogeneous mixture data analysis technology developed by NEC, together with details of some actual applications. The possibility of the utilization of data that has previously been collected without any specific aim is also discussed.
Martin Bauer ・Dan Dobre ・Nuno Santos ・Mischa Schmidt
The explosive growth of the mobile internet calls for scalable infrastructures capable of processing large amounts of mobile data with predictable response times. We have developed a scalable system supporting continuous geo-queries over geo-tagged data streams in the cloud. The experimental results confirm that our system scales, both in the number of supported geo-queries and throughput.
Maurizio Dusi ・Nico d'Heureuse ・Felipe Huici ・Andrea di Pietro ・Nicola Bonelli ・
Giuseppe Bianchi・Brian Trammell・Saverio Niccolini
This paper describes Blockmon, a novel, configurable, software-based Big Data platform for high-performance data stream analytics. Its design allows running applications for a wide range of use cases. When used as network data processing and monitoring platform, it copes with 10 Gb/s of continuous traffic, up to layer 7 (e.g., DPI), on a single commodity server. When used as a server-log processing platform, a fraud detection algorithm built on top of Blockmon can analyze up to 70,000 cps (~250 million BHCA, enough to cope with Japan’s entire phone traffic) on a single machine. Blockmon will be commercialized and applied to analyze operator networks and other applications such as web analytics or financial market analysis, among others.
OKAYAMA Takaaki ・KAGEYAMA Tatsuya ・OHASHI Atsushi ・MINEGISHI Satoshi ・
MURAKAMI Masahiko ・SENDODA Mitsuru
In the disaster-affected areas of the Great East Japan Earthquake, many people are still living in temporary housing shelters. In these areas, people are facing the issue of setting up “community functions,” such as for the transmission and sharing of important information, and for communications among residents, etc. At the same time, they are facing the issue of their local communities collapsing due to depopulation and ageing. This is happening not only in the disaster-affected areas but also in various local municipalities. NEC’s “community development support system” will contribute to establishing new communities in such areas and municipalities. The system employs digital terrestrial television, which is a familiar device to many people, as an information terminal, and broadcasts video content produced by local municipalities, town or autonomous neighborhood associations, NPOs, local residents, etc. in a timely manner.