06 Sep

Introduction to Greenplum ETL tool – Overview

Why ETL is important for Greenplum

As a data warehouse product of future, Greenplum is able to process huge set of data which is usually in petabyte level, but Greenplum can’t generate such number of data by itself. Data is often generated by millions of users or embedded devices. Ideally, all data sources populate data to Greenplum directly  but it is impossible in reality because data is the core asset of a company and Greenplum is only one of many tools that can be used to create value with data asset. One common solution is to use an intermediate system to store all the data.  Read More

05 Sep

On-Demand Machine Learning

Achieving Machine Learning Nirvana
By Shailesh Doshi

Recently, I have been in multiple discussions with clients who want to achieve consistent operationalized data science and machine learning pipelines while the business demands more ‘on-demand’ capability.

Often the ‘on-demand’ conversation starts with ‘Apache Spark’ type usage for analytics use cases but then eventually lead to a desire for an enterprise framework with following characteristics:

  • On-demand resource allocation (spin up/recycle)
  • Data as a service (micro service)
  • Cloud native approach/platform
  • Open Source technology/Open Integration approach
  • Ease of development
  • Agile Deployment
  • Efficient data engineering (minimal movement)
  • Multi–tenancy (resource sharing)
  • Containerization (isolation & security)

Given the complex enterprise landscape, the solution is to look at People, Process and Technology, combined to achieve Machine Learning ‘nirvana’. Read More