Re-Platforming the Enterprise, Or Putting Data Back at the Center of the Data Center
The data center as we know it is going through tremendous changes. New technologies being deployed today represent a new stack that will form a strategic data center platform and will significantly reduce hardware and system costs, improve efficiency and productivity of IT resources, and drive flexible application and workload execution. This new stack will also transform an organization’s ability to respond quickly to changing business conditions, competitive threats, and customer opportunities.
Before we go any further, let’s stop and review why we are talking about re-platforming in the first place. Why change what we’ve worked so hard to deploy?
When it comes to enterprise computing, we have many unquestioned assumptions. The first is that compute is separated from enterprise storage. Companies such as EMC came to dominance by separating the storage network from the computing resources to allow for independent scaling and management of storage. This approach initially provided great savings and better efficiency, but today, with the size of storage networks growing ever faster, and new data sources continuing to proliferate, the ability to scale and coordinate storage and compute networks is expensive and unwieldy.
The second assumption is that production data is separate from analytics. This physical separation typically carries a delay of at least a day of time before data is available in the analytic systems. This has given rise to customized analytic clusters required for in-depth analysis and specific application use, and many different data silos dependent on complex data flows, processing and redundancy.
What Are the Benefits of Re-Platforming?
Organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. Leading organizations are using new technologies, including advanced Hadoop platforms, to merge business analytics and production data to create the “as-it-happens” business.
When talking about analytics, most people think of in-depth reporting or dashboards to review results with the ability to drill down and answer questions about the various details and dimensions of the business. Impacting business “as it happens” does not refer to backward-looking reports to understand “What happened?” or “How did we do?” The “as-it-happens” business is about leveraging data and inline analytics to make adjustments while business is happening to optimize revenue, mitigate risk or reduce operational costs.
For example, The Rubicon Project uses Hadoop to impact ad auctions as they happen—over 100 billion ad auctions a day. It optimizes ad auctions across the most extensive ad reach in the industry, touching 96% of Internet users in the U.S.
One of the largest financial services companies, with over 100 million cards issued, processes petabytes of information with Hadoop to mitigate risk, personalize offers, and drive efficiency as business is happening.
Machine Zone, creators of Game of War, initially had their operations isolated from analytics, which limited its ability to deliver actionable data. In response, it deployed a new Hadoop distribution with an integrated real-time, in-Hadoop database, which incorporated operations and analytics on the same platform to support over 40 million users and more than 300,000 events per second.
What Does Re-Platforming the Data Center Entail?
First, you need to consider all of the hardware you have in your data centers, not just the hardware dedicated to big data or analytics. Administrators need to be able to manage all resources globally and do so on demand. Re-platforming starts with general purpose hardware. This is not necessarily a homogeneous “commodity” layer of low-end white boxes. There is still a need for high-speed disks, memory, and network interconnects to support high-end workloads. Instead, the hardware needs to be decoupled from specialized appliances that are locked in to specific systems or applications.
Second, the new platform requires the ability to expand linearly. If you need additional capacity, simply add additional servers. This is a big advantage of Hadoop. Expanding this linear scalability to support non-Hadoop workloads provides additional value. Linear scalability is required, but you also need real-time processing and low latency performance to support the “as it happens” business. Real-time doesn’t require just big or just fast in separate clusters. It requires big and fast together, working harmoniously.
Third, you need the ability to move different applications and workloads across the data center.
This is the focus of the Apache Myriad project—to provide flexible, global resource management. By removing as many barriers as possible, data can then be exploited for any purpose needed by a business¬, thus reducing the “data-to-action” cycle.
The point of the Myriad project is to allow the flexibility to have one cluster run many disparate applications and workloads. This open source project, originally driven by engineers from MapR, Mesosphere, and eBay, created a containerized approach, leveraging both Mesos and YARN to easily move different workloads including Hadoop across compute resources. For example, a data center with web servers that are provisioned for peak demand and have relatively long periods of low utilization can now serve as a resource for Hadoop workloads.
Historically, applications have dictated the data format and called for IT resources to extract, transform and load data into specialized schemas and formats. Re-platforming calls for a new model that puts data at the center. It is essential to take a data-centric approach for infrastructure to provide flexible, real-time data access and to collapse data silos and automate the data-to-action cycle for immediate operational benefits.
The new data center requires data agility. Data agility calls for a platform that supports a variety of data sources and doesn’t require an administrator to define or enforce a schema. You can have broad flexibility with respect to the analytical and programming capabilities that you can employ, and you can change the level of granularity for your work. Unlike a data warehouse environment, making any of these analytic changes does not result in downtime related to modifying the system or redefining tables.
Re-platforming the enterprise will drive cost savings and efficiency, while also providing a new platform for innovation within and across global data centers. The new data center platform will scale linearly, provide data agility, and handle big and fast data together for operational and analytic workloads. Understanding the future of your data center will help you take the right strategic steps today. Because even a small step with your first Hadoop use case can be a big step towards transforming your data center and your organization’s competitive position.
About the author: Jack Norris is the Chief Marketing Officer at MapR Technologies. Jack has over 20 years of enterprise software marketing experience. He has demonstrated success from defining new markets for small companies to increasing sales of new products for large public companies. Jack’s broad experience includes launching and establishing analytic, virtualization, and storage companies and leading marketing and business development for an early-stage cloud storage software provider. Jack has also held senior executive roles with EMC, Rainfinity (now EMC), Brio Technology, SQRIBE, and Bain and Company.