Data Centers: An Ideal Use Case for Industrial AI
An oft repeated tenet of management is “That which can be measured can be improved.” From finance to productivity, numerous well-known metrics and key performance indicators, such as revenue per employee, drive business processes and management systems.
Measurement and metrics become even more pronounced in operational systems. From quality to efficiency, these metrics often define the processes themselves.
Few operational environments benefit more from process optimization than data centers. These facilities have massive capital expenditures (witness Facebook’s $750M and Google’s $600M new data centers) and equally significant operating costs (they consume 3% of the world’s electricity and have a carbon footprint equal to the worldwide aviation industry). These facilities form the backbone of our daily life, from video streaming to commerce and banking, and their importance will only increase with the expansion of digitization and the continual stream of new services. Uber bears testimony to this statement.
Yet data centers are experiencing severe disruption. An ever-sharper focus on reducing costs and risks, while increasing flexibility, has placed a new spotlight on these facilities. Importantly, data centers have an added aspect few other industries have: optionality.
There are numerous ready alternatives to the traditional data center, from software in the cloud (such as Microsoft 365) to placing equipment in a shared facility (namely the co-location provider market, with many massive players such as Digital Realty and CyrusOne).
Data Center Metrics
Unsurprisingly, there are many established metrics for measuring and operating data centers, including power efficiency, availability, and space utilization. However, with the shifts in the data center market, are we now even focused on the right metrics? For the existing metrics, are we capturing the right data points? Are there hidden metrics and patterns waiting to be exploited to maximize the utility of these facilities?
To address these points, let’s first consider the data center facilities themselves. Simply, the scale, complexity and required optimization of these facilities require “management by AI” as they increasingly cannot be planned and managed with traditional rules and heuristics. There are several factors driving this trend:
- Efficiency and environmental impact: As mentioned at the beginning of this article, data centers consume a conspicuous amount of energy, and as a result, the industry has faced undeniable scrutiny over its energy footprint. Coupled with the costs of consumption, operators are addressing efficiency in ever more creative and complex ways.
- Data center consolidation: Data centers absolutely benefit from economies of scale, and whether corporate data centers are consolidated or moved to co-location facilities, the result is ever larger facilities with increased density and power usage to match.
- Growth of Co-location providers: Co-location providers, such as Equinix and Digital Realty, for whom availability, efficiency and reducing costs are paramount, are growing five times faster than the overall market as noted by a recent 451 Group report. These providers have massive scale and their efficiency-driven business models will drive AI to give them a distinct advantage over competitors.
- Edge computing: The rise of edge data centers, small data centers often geographically dispersed, allows workloads to be optimally placed. Rather than being stand-alone entities, these edge nodes, combined with central data centers or cloud computing, form a larger, cooperative computing fabric. This rich topology provides numerous inputs and controls for optimization and availability, which again are best managed by AI.
The above factors have rightly received greater focus as the data center market has evolved, yet there is another element – perhaps the singularly most important – which until recently, has been overlooked.
Managing Workloads
To explain this factor, let’s consider an illustrative analogy. Why do houses exist? Not for the sake of the structure, but rather to provide shelter and comfort to the inhabitants. Similarly, why do data centers exist? Not for the sake of the many servers and massive power and air conditioning systems, but rather for the applications or workloads which run in the data center.
All of the assets within the data center, from software systems to the facility itself, and the management processes for those assets, exist solely to support the workloads which run atop those assets. This yields the discipline of Workload Asset Management, which is defined as “enabling workload optimization through intelligent insight and comprehensive management of underlying assets.”
How does workload asset management expand the management by AI mantra? In many ways, including:
- Optimizing availability, by incorporating predicted virtualization, facility or IT equipment status into workload management. Workloads can be preemptively moved across virtual environments, within or across data centers, or coordinated with cloud alternatives, to optimize application availability;
- Holistically managing workloads by including new factors such as “cost per workload,” both current and anticipated, into placement and management considerations;
- Optimizing energy usage by managing the facility based on workload behavior. Why keep a house fully cooled when no one is home? We don’t. Shouldn’t data centers work the same way? By having insight into the workloads themselves, cooling and IT systems can be throttled up or down based on current and anticipated behavior;
- Improving predictive maintenance and failure scenarios by using multivariate analysis, incorporating all available data points, including those from workloads, to improve outcomes;
- Intelligently managing alarms and alerts by normalizing and rationalizing significant events across an entire ecosystem, from facilities to workload. A common problem in data centers is dealing with chained alerts, making it difficult to address the root cause of the problems. The growth of edge data centers, which are often remote and unmanned, greatly exacerbates this problem. AI, when coupled with change of rate, deviation or similar algorithms provides an ideal mechanism to identify and act upon critical alerts.
Several important and common trends are emerging in this segment of the industrial AI market:
- Using all inputs to optimize outcomes: Data center operations generate a tremendous amount of useful management information, from critical infrastructure (power, thermal) to security (breaches, anomalies) to IT (server, virtualization). Typically, only a small subset of this information was used, or was tailored to the specific metric or KPI. However, a true cognitive system allows all inputs to be considered, as there may be previously unknown yet rich patterns which greatly improve outcomes.
- Shifting from reactive to proactive: Data centers are predominantly reactive: Application movement occurs after a failure; thermal systems operate based on changes in temperature; vulnerabilities are addressed after a security breach. A well-implemented AI system, with rich data inputs, cognitive analytics and appropriate command and control systems, fundamentally shifts a data center from reactive to proactive mode. What if workloads were optimally placed based on future demand and infrastructure state? What if equipment was preemptively repaired based on exceptionally accurate future predictions of failure? What if entry points were preemptively closed based on anticipated security anomalies? This shift from reactive to proactive operations represents the state-of-the-art in the data center market.
- Analysis of alternatives: What-if scenarios are a mainstay of management systems, but they tend to be manual and “off-line.” In contrast, true AI systems use real-time analysis of all discernible options, utilizing rich inputs to determine outcomes. In an environment with as many variables as a data center, what-if scenarios simply can’t provide required real-time information and deep insight into operational alternatives.
Data centers present an ideal use case for Industrial AI: complex, energy intensive and critical, with a very large set of inputs and control points that can only be properly managed through an automated system. With ever-evolving innovations in the data center, from application performance management linked (APM) with physical infrastructure to closely linked virtualization and multi-data center topologies, the need for and benefit of AI will only increase.
About the author: Enzo Greco is Chief Strategy Officer for Nlyte Software the data center infrastructure management (DCIM) solution provider helping organizations automate and optimize the management of their computing infrastructure.
Related Items:
Cutting through the APM Complexity with Data