Blog
Relationship and difference between Greenplum and PostgreSQL
Greenplum is open-source software for massively parallel database used for reporting, analytics, machine learning, artificial intelligence, and high concurrency SQL. Greenplum database is described as big data technology with a basis of MPP architecture and the PostgreSQL open-source database technology. PostgreSQL is a popular free and open-source relational
Greenplum Hackday 2021
Come and hack Greenplum and win prizes. On Friday, Apr 16th, we are having a hackday for Greenplum around the world. The theme would be anything related to Greenplum: use Greenplum, market Greenplum, break Greenplum, hack Greenplum or do anything related to Greenplum When Friday, Apr 16th 2021
Faster Optimization of Join Queries in ORCA
author:Hans Zeller Optimizing joins is the core part of any query optimizer. It consists of picking a good join order, the right join algorithms (hash join, nested loop join, etc.) and various other things. The number of possible options grows extremely fast and requires a method called Dynamic
Introduction to Greenplum Architecture
This is the first article of the Greenplum Kernel series. There are a total of ten articles in this series, which will explain in depth the different modules of Greenplum. Today I’m going to explain the Greenplum architecture in more detail. Before we talk about Greenplum’s architecture, let’s
Greenplum Summit Week 5: AI, Neural Networks, and the Future of Analytics
Author: Jared Ruckle Every enterprise is refining their AI strategy. So it’s only fitting that the final installment of Greenplum Summit 2020 focused on how artificial intelligence and neural networks will shape the future of analytics. Let’s get right to the highlights! (You can watch all Greenplum Summit
Greenplum Summit Week 4: Parallel Postgres
Author: Bob Glithero For over 15 years, Greenplum has solved the problem of parallelizing Postgres for high-performance querying and analysis of data at massive scale. In Week 4 of the Greenplum Summit, after a brief interlude for a discussion with Heimdall Data, we shift gears a bit to
Greenplum Database Upgrade
Greenplum database team earlier last year started working towards building a in-place major version upgrade tool, gpupgrade. The driving force in developing less time and less space consuming upgrades was to offer an easy upgrade path for customers. This tool will enable customers to quickly and confidently upgrade to
Relocatable Postgres Builds
As engineers on the Greenplum Release Engineering team, we recently had the opportunity to do an in-depth exploration of Postgres’ build system. Greenplum Server is based on Postgres and has inherited the upstream build system. Our team was working on producing a relocatable build of Greenplum Server which
Greenplum Summit Week 3: How to Get Started with a Modern Data Warehouse
The idea of a data warehouse isn’t new. Many enterprises have used them for years. What is new: the data landscape in 2020. The amount of data is exploding, and the use cases requiring real-time data analysis are growing just as fast. Talks from Week 3 cover two
Talkin’ Federated Analytics: Recapping Week 2 of Greenplum Summit
By Bob Glithero Greenplum Summit rolls on: three sessions down, two to go! Week 2 was all about federated analytics, the art of analyzing data from multiple sources to solve business challenges. Here’s our recap. (You can watch all the sessions from Week 1 and Week 2 on-demand
Greenplum Summit 2020: 6 Highlights from Week 1 of the Digital Series
The first two sessions of the virtual Greenplum Summit are in the books. To whet your appetite for the next session (“Data Warehouse Modernization,” happening August 26, sign-up here), we wanted to recap some of our takeaways from Week 1. (We’ll have a recap from Week 2 in
Greenplum Summit Preview: Run Greenplum Your Way
Why is Greenplum so popular? A few factors leap to mind: It’s backed by a thriving open source community Massively parallel data analytics performance Multi-cloud, Infrastructure-native support This third item is the focus of our first set of talks at Greenplum Summit (sign-up here). We wanted to preview
Platform Extension Framework (PXF): Enabling Parallel Query Processing Over Heterogeneous Data Sources In Greenplum
Authors: Venkatesh Raghavan, Alexander Denissov, Francisco Guerrero, Oliver Albertini, Divya Bhargov, Lisa Owen, Shivram Mani, Lav Jain Abstract: With the explosion of data stores and cloud services, data now resides across many disparate systems and in a variety of formats. When multiple data sets exist in external systems,
Image Classification in Greenplum Database Using Deep Learning
Authors: Oliver Albertini, Divya Bhargov, Alexander Denissov, Francisco Guerrero, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Arun Kumar, Frank McQuillan, Lisa Owen, Venkatesh Raghavan, Domino Valdano, Yuhao Zhang Abstract: Artificial neural networks can be used to create highly accurate models in domains such as language processing and
Bottom-up Join Enumeration in a Top-down Optimizer
Authors: Bhuvnesh Chaudhary, Hans Zeller, Sambitesh Dash, Venkatesh Raghavan MAPBU, VMWare Palo Alto CA, USA Abstract: Greenplum Database is a massively parallel processing (MPP) analytics database that adopts a shared-nothing architecture with multiple cooperating processors. A query submitted to the Greenplum master is optimized by the Orca query
Multi-temperature data querying from heterogeneous data stores with Greenplum and PXF
Often in businesses, it is hard to fit all data into a single store. Data that is old and not accessed often (cold data) is generally archived and placed in long-term stores like a data lake or an S3 object store. Data that is recent and prone to