Self-Healing Greenplum – The Doctor Is Always In

Analytics On IaaS Must Think Differently Than It’s On Premise Implementations

We have always maintained that having a data platform that is portable is not only one of the key differentiators of Greenplum, but should be a core functional requirement on anyone’s roadmap for how to best architect for their needs.  But doing so should never be a straight port of what is on premise over to infrastructure in the cloud.  Instead, an understanding of both how our users are leveraging the data platform combined with the power of the cloud should lead us down an alternate, more advanced architecture.  One such innovation that has recently become available is the notion of self-healing Greenplum.  

First of all, let’s think about what self-healing means.  A quick definition would be a system that will detect, diagnose, and repair performance problems and hardware / software faults automatically.  Much like a doctor, hence:

But to really appreciate what this does for you, it’s important to understand the typical on premise deployment for high availability.

On Premise High Availability For Greenplum

First it’s worth going over the high availability strategy of Greenplum in a typical on premise environment.  At the most basic level, Greenplum is a massively parallel data platform that spreads your data across a cluster of machines.  The construct for how the data is spread is based on the notion of segments.  A segment can be split into either a primary segment or a mirror segment and effectively contains a partial amount of the total data in your cluster.  There can be any number of primary segments in your cluster, usually multiple per host, and then multiple hosts per cluster.  To handle any kind of infrastructure failure, we then mirror that segment (and the data within that segment) to somewhere else in the cluster as show in the picture below:

Here you see a master host, which contains a special segment that houses the global catalog (this is also mirrored to a standby master for high availability), as well as a depiction for the simplest configuration where each host (segment host for proper nomenclature) has exactly one primary segment and one mirror segment.  What you can glean from this diagram that in the event of a catastrophic error with segment host 1, you will end up losing primary segment 1 and mirror segment 2.  Greenplum will automatically detect this failure and switch to using the mirror segment of primary segment 1 which resides on host n, and mark the mirror for primary segment 2 as down.  This gives you the time to contact your vendor, get a new host 1 or repair that host, and then recover everything back to normal.

This is a standard configuration that has been use since 2006 with Greenplum installations.

What If You Had Access to Instant Infrastructure?

2017 is when Pivotal first published Greenplum both in a bring your own license (BYOL) as well as an pay as you go (on demand) fashion and has been iterating on it’s offering consistently easy couple of weeks.

The latest release, version 2.1 of AWS releases brings with it a number of enhancements, but the biggest one being completely leveraging the auto scale group functionality that, when Amazon detects a malfunctioning host, the cluster will automatically bring up another host and bring it safely back into the cluster, with zero interaction from your platform administrator, end users, or database administrator.  This is best demonstrated in the following video:

This Changes Everything

Many Greenplum users love the portability, open source, raw speed, ANSI SQL compliance, innovative roadmap, and flexible analytics – what they sometimes lack is the ability to manage the platform itself.  This latest innovation let’s our users enjoy all the power of Greenplum with one less thing to worry about.  Check out why AWS and Greenplum are like tequila and margaritas and continue to look for more innovations to enhance your experience as part of our regular releases.