Greenplum Summit Preview: Run Greenplum Your Way

Why is Greenplum so popular? A few factors leap to mind:

  1. It’s backed by a thriving open source community
  2. Massively parallel data analytics performance
  3. Multi-cloud, Infrastructure-native support

This third item is the focus of our first set of talks at Greenplum Summit (sign-up here). We wanted to preview this track and discuss what multi-cloud really means to the Greenplum community.

Some folks think multi-cloud is simply an abstraction layer to make it easier for software to not care about the underlying infrastructure. This can work well, but certainly limits the amount of advantage you gain from the native capabilities of the infrastructure. For the Greenplum community, multi-cloud means optimizing the deployment for a given infrastructure target. In other words, Greenplum aims to provide a unique, differentiated experience no matter where you choose to deploy.

The talks at the Summit offer up useful examples, so let’s consider a few of them.

To truly understand the future of data, Divya Bhargov, Engineering Director educates us on the value of the software defined data center:

Expanding the Software Defined Data Center to Data

VMware is known for creating the software defined data center. Greenplum is a scale-out, shared nothing, massively parallel data platform. By adding Greenplum to a vSphere-based private cloud, you can create a data-centric private cloud and extend the concept of the software defined data center to include the data architecture. Some of these benefits include:

  • Elimination of Proprietary Hardware Dependencies – Negate the need for expensive training to learn highly specialized vendor hardware because all hardware is commoditized.
  • Simplification of Technology Management – Data can be monitored, systems can be updated and storage resources can be allocated from a single pane of glass.
  • Automation and Orchestration – Software is more reliable than humans, the SDDC is more agile and responsive in all measures.

Oz Basarir, Principal Product Manager then shows how the principals learned from SDDC can be applied with Greenplum running in Kubernetes:

Six Steps to Deploy Greenplum on any Kubernetes

This session covers a detailed view of the artifacts and flows involved in the deployment of Greenplum for Kubernetes use cases. For example, this is popular for ephemeral use cases for non 24×7 use cases like development, automation for CI/CD pipelines, dynamically scaling compute and storage. You will see how images are placed into container registries such as Docker, Harbor, GCR and ECR. From there, the speaker will show how they are then accessed by Kubernetes to create the Greenplum Operator and other containers. Then, Oz will take you on a tour of the deployment options to create a Greenplum Workbench with components such as MADlib, PXF and GPText. Kubernetes is famously “unopinionated”, so you will learn about the choices you have to make when it comes to topologies and storage configurations. Attend this session, and you will walk away equipped to deploy Greenplum on Kubernetes.

Jon Roberts, Lead Engineer then show’s us the infrastructure native capabilities of running in the public cloud marketplaces:

Embrace the Public Cloud for cost savings, cloud-native features, & multi-cloud ability

Perhaps the easiest benefit to reap in the public cloud: pay-as-you-go pricing. Deploy Greenplum on the public cloud and you’ll only:

  • Pay for the CPU and Memory consumed per hour, with the ability to scale up and down with a simple command line utility.
  • Pay for only the storage you need. As your data needs grow, so can your storage. Learn how a simple command line utility can grow your storage with no impact to users.
  • Pause and Resume your cluster to save on IaaS costs with a simple command line utility.

George Billman, Executive Director speaks to easy and complaint security when it comes to data and Greenplum:

Securing Data in the Cloud with SecuPI

Cloud computing has enabled self-service and on-demand access to data and infrastructure resources to everyone. But this comes with a downside. Enterprise IT leaders now need to watch for the additional attack vecotrs that come from this frictionless access. The SecuPI solution provides a seamless addition to a Greenplum deployment, and provides fine grain additive security to enable cloud convenience without comprising on safety of data security. Data security has never been more important than now!

Ji Lim, Senior Data Engineer speaks to leveraging the cloud for disaster recovery for an incredibly robust solution:

On-Demand Disaster Recovery Solution

Traditionally, executing a Disaster Recovery strategy is an expensive endeavor. Once again, the public cloud changes the game. When deploying Greenplum on any public cloud, you can have your cake and eat it too!
Learn how a simple command line utility can transform how you take backups, and how you can easily and simply create an inexpensive disaster recovery solution.

Darryl Smith, Chief Data Platform Architect & Distinguished Engineer and Praveen Gorthy, Greenplum Platform Lead Administrator speak about their experiences in providing the mission critical data lake with Greenplum:

Greenplum Community Spotlight: Dell IT

If you’re Dell IT, how to you store, query, and process all of your enterprise data? With VMware Tanzu Greenplum of course! Learn first-hand from Dell IT practitioners why VMware Tanzu Greenplum was chosen 5 years ago as the basis for their data lake, and how it quickly grew to over 340 Terabytes. Presenters will share details about a recent upgrade, and how new NVMe Dell hardware delivered a 15X performance boost from a system that was already quite fast.

Jason Vigil, Software Engineer talk about about how scaling resources saves not just the pocketbook, but also the environment:

Going Green: Recycling Elastic Compute Resources with Greenplum for Kubernetes

Running Greenplum on Kubernetes provides numerous operational benefits, including the capability to dynamically reallocate compute resources. But how? This session will demonstrate how to scale up CPU / memory allocations for important, time-critical workloads, and how to scale back down for the steady-state. There’s also a third scenario we’ll explore: when there’s no need to run queries at all! We will demonstrate “scaling to zero,” a technique that frees up the compute resources for other workloads while still preserving the cluster state and data. Finally, you will learn how the separation of compute and storage helps you easily scale in response to demand from your users.