Launch! Greenplum Hardware Goes Open Source

Today, we unveil the first massively parallel postgres data warehouse to open source across both software and hardware. In 2015, we launched the world’s first open source MPP data platform. I am happy to announce that we are also sharing Greenplum’s infrastructure. In 2015 we said the future was open and that trend continues today in 2020.

For the past two years, I have worked with a small group of folks designing, testing and supporting our valued customers with advanced-commodity hardware. Partnering with Dell OEM, we created a unique and compelling experience. Dedicated storage devices, a high speed Interconnect and a portfolio of compute-to-storage configurations. The goal was to provide a modern appliance-like data warehouse experience for Greenplum customers.

Dell PowerEdge BOMs

We are happily sharing the Dell PowerEdge Bill of Materials (BOM). The Dell BOMs are being provided in addition to our platform requirements documentation. The Dell BOMs go a step beyond a MPP hardware guide. Hardware BOMs are line item specific configurations. The Dell BOMs are used for some of our customers’ most challenging and performance intense workloads. The Dell BOMs include SAS SSDs, NVMe HHHL cards, BOSS cards and a 40GbE Interconnect. Please see below for more information and a link to the Dell BOMs.

Openness Empowers Community Success

By sharing our Greenplum reference architectures the community will be even more successful when it comes to deployments. You can use the Dell BOMs as a starting point for a hardware discussion. You can purchase the exact gear found in the Dell BOMs. And you can certainly modify the Dell BOMs to fit your workload’s unique requirements.

Community Engagement Drives Knowledge

 

When it comes to open source software, the engagement process is fairly standard. Download the source code, compile it and run the database. Find an issue in the project you care about, write the code and submit a pull request. The same principals apply to hardware BOMs. If you find a specific BOM that has worked well for your unique workloads, slack us

How to Use Dell PowerEdge BOMs

 
Once you download the Excel file (.xlsx), open it with Microsoft Excel and note the tabs at the bottom. Each tab in the spreadsheet represents a particular configuration (BOM), we call these configurations “blocks”. Every block entails two nodes (servers) excluding the deployment kit. There are hardware disk encryption (Self Encrypting Drive) options available for the master, dense and balanced block types.
 
For any cluster built using the Dell PowerEdge BOMs you must have one deployment kit. The deployment kit BOM provides a rack, PDUs (power) and all of the switches required to operate the MPP cluster. Next, you can choose to include a master block if you wish to have dedicated master nodes. For larger clusters, I recommend you create dedicated masters and thus include a master block BOM. Smaller clusters including those with less high availability requirements can have the masters colocated on the segment hosts (worker nodes). 
 
Finally, you need to select a segment host block. For the segment block you can choose from “fast”, “balanced”, “dense” or “super dense” block types. The only difference as you will see, is the density of the disks! Select a segment block type and multiply it by the number of blocks required for your unique workload (compute driven, storage driven or both). The maximum number of segment blocks per physical rack is seven (14 segment nodes).
 
In terms of system configuration, you get to decide how to configure your Dell BOM cluster for your unique workloads, this is one of the benefits of us opening these up. There are many details and decisions in terms of cluster configuration. To get you started, I will share how we have configured these systems. We start with a four-to-one physical core to primary segment ratio pending workload. We leverage the HHHL for the temp filespace (spill files), Red Hat goes onto the BOSS card, we install primary segments on twelve SAS SSDs via a RAID 5+1 hot spare configuration and install mirror segments onto the second set of twelve SAS SSDs via a RAID 5+1 configuration. The Dell BOMs were designed for single-rack clusters as we can pack a lot of both compute and storage via the Dell PowerEdge 740XD chassis. You can modify the Dell BOMs to facilitate multi-rack clusters, for example using aggregation switch(s). And, ideally, you have two 60amp 3-phase power drops for power. Your hardware vendor can assist with power requirements as well. 
 
Go Greenplum!
 

Learn More:

  • Platform Requirements can be found here
  • Dell OEM Website is here
  • Greenplum Slack Community go here
  • Greenplum GitHub Home is here