Greenplum Summit 2020: 6 Highlights from Week 1 of the Digital Series

The first two sessions of the virtual Greenplum Summit are in the books. To whet your appetite for the next session (“Data Warehouse Modernization,” happening August 26, sign-up here), we wanted to recap some of our takeaways from Week 1. (We’ll have a recap from Week 2 in the coming days.)

You can watch all the sessions from the first week on-demand in the VMware Learning Zone.

1. Greenplum has a refreshing value proposition: more capabilities at a lower cost.

This shouldn’t feel unique, but it does. The “more for less” mantra is certainly appealing to those looking to move away from proprietary systems like Oracle and Teradata and Snowflake.

What about many of the new entrants in the market? Jacque says there’s an explosion of bespoke tools that aim to solve for niche use cases. Each may have their role to play as a point solution. But for a single platform that can handle a wide range of enterprise scenarios, Greenplum’s “more for less” value stands apart.

Greenplum’s Postgres heritage serves the project well; so too have community bets on parallel processing, extensions for Python and R, as well as federated query capabilities. Add in support for many different infrastructure targets, and it’s easy to see why the project continues to thrive.

2. Deploying Greenplum on-premises can be a simple way to start your modern analytics journey.

If you’re looking to grow your analytics capabilities where should you start? Deploying Greenplum on-premises could be the logical place to start. Divya Bhargov from VMware notes that Greenplum on vSphere simplifies operational tasks with VM templating and easy provisioning. Most of your data is running on-prem, so it’s a logical place to start. Plus, we have refined three common hardware configurations to help you size your environment.

3. Use Greenplum atop Kubernetes for on-demand and ephemeral use cases.

With so many open-source projects exploring Kubernetes, it’s only natural that Greenplum engineers would as well. Oz Basarir notes that the Greenplum team found Kubernetes to be an incredibly fast (and repeatable) way to deploy Greenplum and related components.

When would you want to use Greenplum for Kubernetes? Oz cites one real-world example: self-service clusters for your data scientists. Got an experiment you want to run? Spin up a Greenplum cluster quickly. Conduct your experiment, then blow the cluster away when complete. This injects useful agility into your analytics workflows.

Interested in deploying Greenplum for Kubernetes? Download an evaluation copy from the Tanzu Network here.

Related note: Jason Vigil gave a talk that showed how easy it is to perform common scaling commands with Greenplum for Kubernetes. His guidance will come in handy for these scenarios.

4. It’s easy to save money when you deploy Greenplum in the public cloud.

Many Greenplum users extend their footprint into the public cloud to achieve greater speed, resiliency, and scalability. Don’t forget about the other big benefit: cost savings.

Jon Roberts explains how the team has exploited the flexibility of Greenplum in it’s cloud-native implementation to minimize expense in any public cloud. Pay only for what you use and know how much you’ll spend with predictable monthly costs. But there’s many levers that you have to reduce your public cloud bill.

On-demand billing and elastic scale for either compute or storage are the most well-known methods. But you can also exploit these other advanced advantages:

  • Automated Fault Tolerance. Yes, public clouds have self-healing capabilities at the IaaS layer. Greenplum continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance without any user or dba intervention. This saves you money on infrastructure and operations.
  • Automated Backups. Backups are critical to a healthy database strategy. But these can be expensive. The public cloud “snapshot” features are a terrific affordable option.

Ready to try Greenplum in the public cloud? Check out the Greenplum templates in the AWS, Azure, and Google Cloud marketplaces.)

5. You can build a cost-effective disaster recovery solution in the public cloud. (Yes, really.)

Executing a credible DR strategy seems aspirational, something only attainable for the largest organizations. As it happens, public clouds offer a low-cost, effective solution. Jon touched on this point in his talk. But Jin Lim’s talk goes much deeper.

The core idea is simple. If you’re in the public cloud already, take regular snapshots of your primary Greenplum cluster. Place these snapshots in object storage. When disaster strikes, spin up an on-demand Greenplum cluster. Then copy the latest snapshot over, and restore it. You’re back online!

Running on-prem? No problem! The concept remains the same. You’ll just need to use a data transfer appliance to get a complete copy of your database uploaded to your chosen cloud. You can follow the same process for incremental backups as well. Once your data is available here in object storage, you can then create a new Greenplum cluster as in the public cloud scenario.

6. Greenplum and cloud providers give you a strong security posture. But it may not be enough.

Any enterprise database is going to have useful security provisions woven into the system. Greenplum includes authentication and role-based access controls to govern who has access, and what they can do once they’re logged in. What about protecting all that data in your database? VMware, AWS, Azure, and Google all have data encryption at rest and in transit. So far so good!

But as George Billman of SecuPi explains, there are many organizations that require a more aggressive data security posture. What does George advise these firms to do? He offers a four-point plan:

  • Anonymize sensitive data before putting into the cloud. An important issue: WHO holds the key to anonymize or decrypt data.
  • Implement finer-grained data access policies with access control parameters such as time, geo-location or business state.
  • Generate “virtual views” in real-time from user roles and the aforementioned access control parameters. This approach can improve ‘time to data’ and lower costs of coding static views.
  • Monitor user activity to access data in real-time. This is a proactive way to head off threats before they manifest themselves.

If you have unique data security requirements – or want to get ahead of upcoming data privacy regulations – SecuPi is worth a look.

We’re Just Getting Warmed Up: Join Us for the 3 Next Sessions of Greenplum Summit

Like what you’ve seen so far? We’ve got three more sessions of amazing content in the works. Register for any of the upcoming sessions and join the conversation!

  • Aug 26: Data Warehouse Modernization (register)
  • Sept 9: Parallel Postgres (register)
  • Sept 23: AI, Neural Networks, an the Future of Analytics (register)