Displaying present location in the site.

Integrated Operation and Management Platform for Efficient Administration by Automating Operations

Vol.9, No.2 June 2015, Special Issue on Future Cloud Platforms for ICT Systems

The high quality/low price provision of cloud services need not only an advanced system configuration but also require efficiency improvement and automation of the operation of the system platform itself. This paper describes an integrated operation and management platform that contributes to operations cost reductions and service quality improvements of the NEC Cloud IaaS. These improvements are achieved by means of various functions including the system platform configuration/management function, the access management function, the auto voice escalation function and the normality check function.

1. Introduction

Now that the IaaS is being adopted for the infrastructures of corporate information systems, the IaaS providers are required to offer both efficient operations for reducing the cost and reliable services. The IaaS platform operation is required to manage a huge volume of components and software. To achieve high reliability, it must also ensure security and internal control of operations and identify the fault location quickly in the case of a problem.

In order to solve such issues as those described above, a system that improves the quality, security and efficiency of the cloud platform operation is built for the integrated operations management platform in our cloud platform service “NEC Cloud IaaS” (hereafter NECCI).

2. Tasks in Integrated Operations and Management

Among accumulated experience of our system operation management and service provision, we have selected the tasks that can contribute to the reduction of the IaaS service provision costs and to improvements in service quality and reliability. Then, we have systematized them so that those tasks can be automated. The target operations include:

  • Configuration management (floor/racks/components/VM/licenses)
    This task manages the configuration and relationship between the floor, racks, components and VMs (Virtual Machines) as well as the licensing information and maintenance contracts. Therefore, the large amount of IaaS platform components can be controlled appropriately and efficiently and thereby prompt fault counteraction and compliance can be secured.
  • Change/ID/access management
    To secure the internal control of system operations, this task manages the system changes and the access applications, work approvals and records. It also performs management of the privileged IDs and the matching of system access applications and login histories in order to control access to the various platform components. Automation of these tasks can efficiently reduce security risks, including inside attacks.
  • Auto voice escalation/normality check
    This task is essential to implement a 24-hour, 365-day operation system; such as escalation to SEs based on the system monitoring alerts and normality check operations.

    By automating these jobs, the operation labor costs will be reduced significantly.

3. Technologies and Operation of Management and Integrated Operations Platform

The integrated operation and management platform improves the quality and efficiency of the IaaS service provision by systematizing the configuration management, change/ID/access management and auto voice escalation/normality checks. The implementation methods for a practical system are discussed in the following subsections.

3.1 Operational Efficiency Improvement via Configuration Management

(1) Technologies and software employed

  • Configuration management tool
    This tool boasts the achievement of the management of about 7,000 nodes among the provider businesses. It integrates information from various components and manages the logical and physical pieces of configuration information by interlinking them.
  • NetCracker
    This tool has already been offered to more than 260 businesses worldwide. NECCI uses it for the configuration information display and license management.

(2) Outline of configuration management

The configuration management tool collects configuration information and then distributes it to the NetCracker and the operation portal (NELP). The users can reference and use the configuration information registered in the NetCracker and also in the configuration management tool. Fig. 1 defines the configuration management.

Fig. 1 Configuration of the configuration management application.

(3) Collection of configuration information

In order to improve the efficiency of input operations for the configuration information collection, data used for NECCI, which is retained by the software and management tools, is collected automatically and linked each other based on the relationship information.

The configuration management tool periodically collects the individual pieces of management information on the tenants, physical resources and virtual resources from the platform management tools, such as the VMware vCenter, nova, NetApp and iStorageManager. It then generates data linking relationships between the physical and virtual resources and those between virtual resources and tenants, and forwards them to the NetCracker.

(4) Utilization of configuration information

The users and administrators of the IaaS and data centers can reference the configuration information that links the relationships between tenants and each component and between components on the web display of the NetCracker. The display can also serve for search, listing and file output of hardware/software so that it can be used to identify the assets inventories and license usage situations. When it is used in referencing the number of working VMs or racks, it can also serve for the capacity management.

(5) Instant identification of the affected tenants

In order to identify the situation instantly in case of a fault with a platform component, we have developed a function that creates a fault-affected tenant list that includes information such as the tenants, nodes (virtual servers) and contact persons who may be affected by the fault, and then notifies the list to administrations by e-mail. The list is created by compiling the information defining the relationships between physical servers, virtual servers and tenants that have been collected and generated by the configuration management tool. The monitoring alert of the Zabbix is used as the trigger for creating and e-mailing the list.

3.2 Improvement of Change/ID/Access Management Efficiencies

(1) Technology and software usage

  • NELP
    This is the operation portal for providing high-quality IT services based on the ITIL (IT Infrastructure Library). It has been used successfully in system operation/management projects using the ITIL and is equipped with the communications management functions required for operations such as workflow, monthly reports and document management functions.
  • IAM (ID & Access Management)
    This technology reduces labor related to the ID management, work applications, approvals and confirmations and is also equipped with a mechanism for detecting and preventing illegal accesses.
    The ESS REC and ESS AutoAuditor are pieces of software that provide work trail management and the worker’s system access control functions. The SecureMaster performs management, distribution and change of the IDs of platform servers on the NECCI.
  • Logstorage
    This software collects the server access logs. At the same time, it enables various log-related functions such as high-speed collection, inquiries and alteration prevention. In addition to these software functions, NECCI has also developed a function that automatically matches the data on access applications at the NELP and the collected log, so that it detects non-applied accesses in order to secure the internal control.

(2) Outline of change/ID/access management

A workflow system integrates and manages information regarding change request and approval of applications in order to improve the operations efficiency required for processing such information. The platform components can be accessed by passing through the gateway server using a one-time ID.

In order to access a platform component it involves various operations such as the distribution of personal ID, issuance of one-time ID and checking of the access log. Automating these operations makes it possible to improve the operations efficiency. Fig. 2 shows the configuration of the ID access management.

Fig. 2 Configuration of ID/access management application.

(3) Change management

The applicant can make a change request using the workflow function provided with the NELP. After the change request is approved, the modification work is carried out and the change details are reported via the NELP after completion of the work. The change request information is subject to be integrated and linked with the access applications (to be described later), therefore the change work situation can be easily identified.

(4) ID management

A user who wants to use a platform component should create a public key and a secret key for the ssh connection and send an ID registration application to the NELP by attaching the public key. When the application is approved, the NELP and SecureMaster link data and distribute the personal ID to all of the ID distribution-target platform components.

(5) Access management

The operator requests an access application to the NELP by specifying the access-target platform components and scheduled work start/end date and time, etc. When the application is approved, the NELP automatically links the data to the ESS and issues the one-time ID for logging into the gateway server. The one-time ID of the gate server becomes valid on the applied work start date/time so that the worker can access the system.

When the work-check personnel reports the result through the NELP after completion of the work, the one-time ID becomes invalid and the access to the components are denied. Whether or not the application contents match the access log is checked automatically (log collation) in order to reduce the operational costs related to the security manager.

3.3 Auto Voice Escalation/Normality Check

(1) Technology and software used

  • Premier Voice
    This cloud service facilitates automatic use of the voice-call notification function by using the API. As this service is provided from the cloud, the user does not have to possess a phone circuit or voice-call control equipment but can control voice-calls using the open API (SOAP).
  • Selenium
    This is a web application test tool. It can perform testing automatically using the web browser. NECCI uses it as a tool for normality checks.

(2) Outline of auto voice escalation/normality check

The auto voice escalation notifies abnormalities detected on a monitored server automatically with a voice-call.

The normality check features automation of the web service normality check that has previously been done manually by an operator. It has eliminated the need of operators working for 24 hours a day for system operations and therefore reduces operation costs. Fig. 3 shows the configurations of the auto voice escalation and normality checks.

Fig. 3 Configuration of auto voice escalation/normality checks.

(3) Contents of auto voice escalation

The escalation execution details can be designed on the self-service portal by defining the contacts persons, contact order, contact time zone such as weekdays, holidays, daytime or nighttime, etc. When an abnormality is detected on a monitored server, the contacts are identified based on the registered escalation design and APIs are issued to the Premier Voice in the defined order to execute the automatic voice notification.

By combining it with the normality check, it is also possible to check the normality automatically and control the execution of the automatic phone notification depending on the result. The escalation result is sent to all contacts by e-mail and is also accessible from the self-service portal so that awareness of the situation can be shared by all of the persons concerned.

4. Next-Generation Operation Services

For the operation of cloud-based services, we possess an operations service system that assumes monitoring, notices and routine operations.

Previously, the operations systems, processes and tools were built per customer or service menu based on their requests. As the tools and processes were not always NEC in-house products, such provision style could not provide remarkable achievements in cost reduction and working efficiency improvement.

To resolve these issues in the next-generation operation service, we have reviewed the service system and operation processes so that they can utilize the integrated operation and management platform and self-service portal maximally. We have also reduced human labor input considerably in order to improve the operations costs, operation speeds and quality. As a result, we have succeeded in reducing the operation costs by about 20% and in enabling the provision of services at a lower price and of a higher quality.

5. Conclusion

The automatic voice escalation and normality check functions are used not only for the purpose of improving the internal operation efficiencies of the NECCI group but are also released as options in the tenants’ menu so that they can be used by the tenants. The change/ID/access management functions are useful for enabling any business to establish internal control. We therefore packaged these functions in a single server and prepared the tenant menus so that they may be used as the VM images of NECCI.

The integrated operation and management platform introduced above can be used as the platform for various services provided by NEC as well as by NECCI. We anticipate that our customers will be able to improve the efficiency of IT system operations by making full use of NEC’s proved technologies and services that we have accumulated so far.

  • *
    VMware vCenter is a registered trademark or trademark of VMware, Inc. in the U.S. and other countries.
  • *
    NetApp is a trademark or registered trademark of NetApp, Inc. in the U.S. and/or other countries.
  • *
    Zabbix is a registered trademark of Zabbix SIA.
  • *
    ITIL is a registered trademark of AXELOS Limited.
  • *
    ESS REC and ESS AutoAuditor are trademarks or registered trademarks of Encourage Technologies Co., Ltd.
  • *
    Logstorage is a trademark or registered trademark of Infoscience Corporation.
  • *
    All other company and product names that appear in this paper are trademarks or registered trademarks of their respective companies.

Authors' Profiles

Cloud Platform Service Department
Platform Services Division
HIRAI Masaki
Department Manager
Cloud Platform Service Department
Platform Services Division