Setting the Stage
Growing up in the enterprise data and analytics marketplace, I’ve had the good fortune to see a number of game-changing technologies born and rise in corporate adoption. In a subset of cases, I’ve seen the same technology collapse just as quickly as it rose. However, Teradata Database is not one such technology. While I was designing-and-building Kimball Dimensional Data Warehouses and in other cases Inmon Corporate Information Factories, leveraging a variety of database technologies, Teradata was ever-present and “reserved”. It turned out, Teradata was usually reserved due to the high cost of incorporating additional workloads.
Present day, serving as a field technical lead for Pivotal Data, I have the good fortune to share with you an elegant, software-driven, de-risked migration approach for Teradata customers tired of cutting the proverbial check and desiring data platform modernization.
Teradata Database: Costly & Proprietary
Teradata Database, the proprietary Massively Parallel Processing (MPP) database engine that was born an appliance in the 1980s, and that powers some of the world’s largest data warehouse deployments continues to lock in customers with high annual spends. It is not uncommon for Teradata customers to spend $10M+ annually with the database vendor. Additionally, today’s enterprise seeks modern data warehousing products that are open-source based, can run literally anywhere, and that has a rich set of analytics available out-of-box. While Teradata Database has no doubt been a fantastic data warehousing system for customers in decades past, today’s analytical-driven enterprise is no longer content.
Teradata Customer Objective #1: Reduce Annual Spend
Teradata customers all have one objective in common, the desire to gain control of their annual spend. Working with a collection of customers, across a variety of vertical markets, this is the one constant I’ve witnessed. Some customers simply wish to curb their annual spend with Teradata while others are looking for aggressive reduction and/or complete elimination.
Teradata Customer Objective #2: Modernize the Data Platform
The second objective I consistently see from Teradata customers is that of modernization. Teradata is proprietary (closed-source) MPP database software. Teradata customers can only obtain and run the database software directly from the commercial vendor. Nor can they submit code back to the vendor for potential product inclusion. Additionally, they can only run Teradata Database for environments the company has preliminarily certified. Last but certainly not least, Teradata customers are seeking database platforms that are constantly innovating, both in terms of available analytical methods and data federation.
Greenplum Database: Competitive & Open-Source
Greenplum Database (Greenplum), conversely, was born as a software-only MPP database system in 2003. Based on a parallel architecture that leverages the open-source PostgreSQL database, Greenplum (originally called Bizgres) enabled customers to quickly-and-easily spin up MPP clusters wherever they were required with an optional reference architecture based on Sun Microsystems hardware. It is important to note that Greenplum was originally a closed-source MPP database as well. However, in October of 2015 Greenplum transitioned to an open-source core. Additionally, like most open-source based software products, the commercial distribution of Greenplum has always been priced competitively.
Since the open-source announcement, Greenplum has continued to receive a number of game-changing features that empower the next-generation data platform. Greenplum’s engineering backlog started getting prioritized by a couple of key principals the leadership team felt the market no longer was asking about but downright demanding. These key principals are listed below.
Greenplum Database Key Principals
- Run Anywhere – The ability to run wherever customers desire including all clouds (private and public)
- Persist Anywhere – The ability to enable federated data access across an ever-growing list of supported backing data repositories
- Open Everywhere – Having an open-source based parallel database engine core with a surrounding ecosystem fostering rapid innovation
- Integrated Analytics Everywhere – The ability to enable various types of analysis leveraging an integrated database core
One interesting thing to note is as of late, the folks at Teradata are now starting to discuss the very same core principals as noted in this press release. After several years of engineering to these core principals, we have the next-generation data platform in hand. Below, you will find our next-generation data platform diagram powered by Greenplum Database. You can click on the image to expand it as well.
The Problem: Teradata Migrations are High Risk
The challenge, historically speaking, for the collective customer base has been the migration itself. Teradata Database was, admittedly, the data lake before the data lake. There are usually many (many) workloads found within any current day Teradata deployment. And in some cases, the owning enterprise does not always know all of the inter-dependencies found between their Teradata Database and external systems.
Based on several customer’s prior research, it can (literally) require years of labor to migrate a complete Teradata deployment to a competing platform. And it is not the data and schema that is the bulk of the problem, it is all the connected applications (ETL, OLAP Cubes, Reports, etc.) that require rewrite. This is without question, one of Teradata’s assets that materializes itself in the form of vendor lock-in not including the fact it is proprietary software in the first place. When viewing the situation from the perspective of the CIO, the result is more often than not, to continuing paying Teradata while leveraging competing products for net new workloads.
The Solution: De-Risk the Teradata Migration
As of late, however, a new elegant method has emerged for the efficient migration of existing Teradata workloads into Greenplum Database. Adaptive Database Virtualization allows databases to be “virtualized” thereby enabling the enterprise to seamlessly redirect connected applications to different backing stores in real-time. As a result, years of manual migration work now becomes weeks. Millions of dollars in labor expense now become thousands. And the number of risk factors involved gets severely reduced.
The Datometry Hyper-Q for Greenplum Database solution enables Greenplum to emulate Teradata Database workloads. As a result, all that you need to do is simply point your existing Teradata Database applications to a running Datometry Hyper-Q for Greenplum Database instance and migrate the backing data with schema leveraging tools we provide. Below is an image showing how the solution works in detail. You can click on the image to expand it as well.
Perhaps the best part about the Teradata Database migration approach I’ve described in this post is the minimum commitment required to get started. By simply running a few scripts we provide on your Teradata Database system, we can give you an initial workload compatibility analysis in as little as two days! The point of the initial workload analysis is to give you a sense of confidence early-on, that your unique Teradata Database workloads are highly compatible with Datometry Hyper-Q for Greenplum Database. Additionally, we provide assistance to help the customer when the workloads in Teradata are not well defined or known. Customers can optionally choose to execute a full proof of concept thereafter the initial analysis as well. For more information or to get started contact us.
About the Author
Derek Comingore is a passionate data engineering and analytics change agent with 16 years experience in the field, four of which building-selling an MPP systems integrator firm. Derek leads Pivotal’s central data engineering team and is the technical lead for Teradata Takeout. When not working in high-tech data, Derek enjoys spending time with his wife, family, and friends in Wisconsin.