01 Nov

Using The Greenplum Connector To Load Data Into Gemfire

One use case organizations face is the need to bulk load data into Gemfire Regions where regions in GemFire are similar to the table concept in a database.   Unlike a database, bulk-loading data into GemFire is more of a programming exercise than encountered with traditional bulk loading capabilities of a modern database product.  If the data sources and formats are relatively static, than a GemFire data loader will work for repeated loads of the source data types and formats.   As we all know, data sources, formats and types can be a moving target.

Read More

02 Oct

High Concurrency, Low Latency Index Lookups with Pivotal Greenplum Database

By Cyrille Lintz, Dino Bukvic, Gianluca Rossetti

You may have heard or read that Pivotal Greenplum is not suitable for small query processing or low latency lookups, but like any data platform, your mileage may vary depending on the use case and how you architect it. This post explains how to tune Pivotal Greenplum for an unusual workload: a “warm” layer below an in-memory key value store. We will explain how to tune Pivotal Greenplum to achieve a millisecond-range answer on key values access by using data populated by using a native JSON datatype store in the “key” column. Read More

14 Sep

Pivotal Greenplum: Life in a Vacuum by Howard Goldberg

Vacuuming your home is a laborious task that you would rather not do.  However, vacuuming your home is an essential chore that must be done. The same is true for vacuuming the catalog in a Pivotal Greenplum database (“Greenplum”). The proper maintenance and care is required for the Greenplum catalog to keep the database functioning at its peak efficiency. Read More

13 Sep

Introduction of Readable External Protocol of gpfdist

As the fundamental of all ETL operation of Greenplum, it worth explaining a little more  about the detail of gpfdist to understand why it is faster than other tools and how could we improve in future.

This blog will focus on the detail of communication of readable external table between gpfdist server and Greenplum, and introduce the traffic flow and protocol of gpfdist external table. Read More

06 Sep

Introduction to Greenplum ETL tool – Overview

Why ETL is important for Greenplum

As a data warehouse product of future, Greenplum is able to process huge set of data which is usually in petabyte level, but Greenplum can’t generate such number of data by itself. Data is often generated by millions of users or embedded devices. Ideally, all data sources populate data to Greenplum directly  but it is impossible in reality because data is the core asset of a company and Greenplum is only one of many tools that can be used to create value with data asset. One common solution is to use an intermediate system to store all the data.  Read More

05 Sep

On-Demand Machine Learning

Achieving Machine Learning Nirvana
By Shailesh Doshi

Recently, I have been in multiple discussions with clients who want to achieve consistent operationalized data science and machine learning pipelines while the business demands more ‘on-demand’ capability.

Often the ‘on-demand’ conversation starts with ‘Apache Spark’ type usage for analytics use cases but then eventually lead to a desire for an enterprise framework with following characteristics:

  • On-demand resource allocation (spin up/recycle)
  • Data as a service (micro service)
  • Cloud native approach/platform
  • Open Source technology/Open Integration approach
  • Ease of development
  • Agile Deployment
  • Efficient data engineering (minimal movement)
  • Multi–tenancy (resource sharing)
  • Containerization (isolation & security)

Given the complex enterprise landscape, the solution is to look at People, Process and Technology, combined to achieve Machine Learning ‘nirvana’. Read More