Gpfdist support both readable external table and writable external table. This blog will introduce how writable gpfdist external table works. Read More
Hello, my name is Dmitry Dorofeev, I’m a software architect working for Luxms Group. We are a team of creative programmers touching technology which moves faster than we can imagine these days. This blog post is about building a small streaming analytics pipeline which is minimalistic, but can be adapted for bigger projects easily. It can be started on a notebook (Yes, I tried that), and quickly deployed to the cloud if the need arises. Read More
Greenplum Database is built for advanced Data Warehouse and Analytic workloads at scale. Whether the data set is five terabytes on a handful of servers, or over a petabyte in size on a hundred-plus nodes, the architecture of Greenplum allows it to easily grow to meet the data management and concurrent user access requirements of the platform. To manage very large tables, easily measured in billions of rows organized in logical partitions, Open Source Greenplum provides a number of table types and compression options that the architect can employ to store data in the most efficient way possible. Read More
Conquering Your Database Workloads
Howard Goldberg – Executive Director, Morgan Stanley, Head of Greenplum engineering
Everyone has been in some type of traffic delay, usually at the worst possible time. These traffic jams result from an unexpected accident, volume on the roadway, or lane closures forcing a merge from multiple lanes into a single lane. These congestion events lead to unpredictable travel times and frustrated motorists.
Databases also have traffic jams or periods when database activity outpaces the resources (CPU/Disk IO/Network) supporting it. These database logjams cause a cascade of events leading to poor response times and unhappy clients. To manage a database’s workload, Greenplum (4.3+) utilizes resource queues and the Greenplum Workload Manager (1.8+). Together these capabilities control the use of the critical database resources and allow databases to operate at maximum efficiency. This article will describe these workload manager capabilities and offer best practices where applicable. Read More
A common question that is frequently asked when performing maintenance on Greenplum tables is “Why does my ALTER TABLE add column DDL statement take so long to run?” Although it appears to be a simple command and the expectations are that it will execute in minutes this is not true depending on the table organization (heap, AO columnar compressed, AO row compressed), number of range partitions and the options used in the alter table command.
Depending on the size of the table a rewrite table operation triggered by an alter table/column DDL command could take from minutes to multiple hours. During this time the table will hold the access exclusive locks and may cause cascading effects on other ETL processing. While this rewrite operation is occurring there is no easy way to predict its completion time. Please note that since Greenplum supports polymorphic tables a range partitioned table can contain all three table organizations within a single parent table, this implies that some child partitions can trigger a rewrite while others may be altered quickly. However, all operations on a range partitioned table must complete before the DDL operation is completed.
Setting the Stage
Growing up in the enterprise data and analytics marketplace, I’ve had the good fortune to see a number of game-changing technologies born and rise in corporate adoption. In a subset of cases, I’ve seen the same technology collapse just as quickly as it rose. However, Teradata Database is not one such technology. While I was designing-and-building Kimball Dimensional Data Warehouses and in other cases Inmon Corporate Information Factories, leveraging a variety of database technologies, Teradata was ever-present and “reserved”. It turned out, Teradata was usually reserved due to the high cost of incorporating additional workloads.
Present day, serving as a field technical lead for Pivotal Data, I have the good fortune to share with you an elegant, software-driven, de-risked migration approach for Teradata customers tired of cutting the proverbial check and desiring data platform modernization.
One use case organizations face is the need to bulk load data into Gemfire Regions where regions in GemFire are similar to the table concept in a database. Unlike a database, bulk-loading data into GemFire is more of a programming exercise than encountered with traditional bulk loading capabilities of a modern database product. If the data sources and formats are relatively static, than a GemFire data loader will work for repeated loads of the source data types and formats. As we all know, data sources, formats and types can be a moving target.
By Cyrille Lintz, Dino Bukvic, Gianluca Rossetti
You may have heard or read that Pivotal Greenplum is not suitable for small query processing or low latency lookups, but like any data platform, your mileage may vary depending on the use case and how you architect it. This post explains how to tune Pivotal Greenplum for an unusual workload: a “warm” layer below an in-memory key value store. We will explain how to tune Pivotal Greenplum to achieve a millisecond-range answer on key values access by using data populated by using a native JSON datatype store in the “key” column. Read More
What are the Top 5 reasons that Greenplum is gaining in popularity and is the world’s next generation data platform? Read More
Vacuuming your home is a laborious task that you would rather not do. However, vacuuming your home is an essential chore that must be done. The same is true for vacuuming the catalog in a Pivotal Greenplum database (“Greenplum”). The proper maintenance and care is required for the Greenplum catalog to keep the database functioning at its peak efficiency. Read More