Blog

Pivotal Greenplum: Life in a Vacuum by Howard Goldberg

Vacuuming your home is a laborious task that you would rather not do.  However, vacuuming your home is an essential chore that must be done. The same is true for vacuuming the catalog in a Pivotal Greenplum database (“Greenplum”). The proper maintenance and care is required for the Greenplum catalog to keep the database functioning…
Read more

Introduction of Readable External Protocol of gpfdist

As the fundamental of all ETL operation of Greenplum, it worth explaining a little more  about the detail of gpfdist to understand why it is faster than other tools and how could we improve in future. This blog will focus on the detail of communication of readable external table between gpfdist server and Greenplum, and…
Read more

Graphing Orlando IoT Temperature Sensor Readings

I wondered what temperatures in Orlando have done over this last week. You see I just happen to have a set of IoT devices which are streaming data that I persist into an archive. One of those sensors is on a covered patio in Orlando, so it would be interesting to see what kind of…
Read more

Introduction to Greenplum ETL tool – Overview

Why ETL is important for Greenplum As a data warehouse product of future, Greenplum is able to process huge set of data which is usually in petabyte level, but Greenplum can’t generate such number of data by itself. Data is often generated by millions of users or embedded devices. Ideally, all data sources populate data…
Read more

On-Demand Machine Learning

Achieving Machine Learning Nirvana By Shailesh Doshi Recently, I have been in multiple discussions with clients who want to achieve consistent operationalized data science and machine learning pipelines while the business demands more ‘on-demand’ capability. Often the ‘on-demand’ conversation starts with ‘Apache Spark’ type usage for analytics use cases but then eventually lead to a…
Read more

Meetup: Introducing Greenplum 5.0

Wednesday, September 20th, 2017 6:00 PM PST Pivotal 875 Howard St., 5th Floor, San Francisco, CA (map) Greenplum 5.0 is a commercially available and open source Data Warehouse. This is the next milestone for the Greenplum community since Greenplum was officially open sourced in October of 2015.

Data-Driven Automation in Spring

Data-Driven Software Automation By Kyle Dunn Most of us don’t give much thought to elevator rides and the data-driven nature of them. A set of sensors informs precise motor control for acceleration and deceleration, providing a comfortable ride and an accurate stop at your desired floor. Too much acceleration brings the roller coaster experience to…
Read more

Short-circuiting the Java stack trace search

PCF Application Log Analytics By Kyle Dunn Many developers agree Java stack traces are the source of headaches and needless screen scrolling. Occasionally the verbosity is warranted and essential for debugging, although, more often, the overwhelming detail is just that, overwhelming. In the spirit of better developer productivity and shorter debugging cycles, this post will…
Read more

Some Bits on PXF Plugins

“Occasionally it becomes desirable and necessary…to make real what currently is merely imaginary” By Kyle Dunn If you’ve not heard already, Pivotal eXtensible Framework, or PXF (for those of you with leftover letters in your alphabet soup), is a unified (and parallel) means of accessing a variety of data formats stored in HDFS, via a…
Read more

Going Beyond Structured Data with Pivotal Greenplum

When you think about data in a relational data management system, you think of a structured data model organized in rows and columns that fit neatly into a table. While relational databases excel at managing structured data, their rigidity often causes headaches for organizations with diverse forms of data.