Greenplum Database Tutorials

For use with the Greenplum Database Sandbox

Create Tables

The CREATE TABLE SQL statement creates a table in the database.

About the distribution policy

The definition of a table includes the distribution policy for the data, which has great bearing on system performance. The goals for the distribution policy are to:

The distribution policy determines how data is distributed among the segments. Defining an effective distribution policy requires an understanding of the data’s characteristics, the kinds of queries that will be run once the data is loaded into the database, and what distribution strategies best utilize the parallel execution capacity of the segments.

Use the DISTRIBUTED clause of the CREATE TABLE statement to define the distribution policy for a table. Ideally, each segment will store an equal volume of data and perform an equal share of work when processing queries. There are two kinds of distribution policies:

When different tables are joined on the same columns that comprise the distribution key, the join can be accomplished at the segments, which is much faster than joining rows across segments. The random distribution policy makes this impossible, so it is best practice to define a distribution key that will optimize joins.

Exercises

Execute the CREATE TABLE script in psql

The CREATE TABLE statements for the faa database are in the faa create_dim_tables.sql script.

  1. Change to the directory containing the FAA data and scripts:

    $ cd ~/gpdb-sandbox-tutorials/faa

  2. Open the script in a text editor to see the text of the commands that will be executed when you run the script.

    gpadmin@gpdb-sandbox faa]$ more create_dim_tables.sql
    create table faa.d_airports (airport_code text, airport_desc text) distributed  by (airport_code);
    create table faa.d_wac (wac smallint, area_desc text) distributed by (wac);
    create table faa.d_airlines (airlineid integer, airline_desc text) distributed   by (airlineid);
    create table faa.d_cancellation_codes (cancel_code text, cancel_desc text)   distributed by (cancel_code);
    create table faa.d_delay_groups (delay_group_code text, delay_group_desc text)   distributed by (delay_group_code);
    create table faa.d_distance_groups (distance_group_code text,   distance_group_desc text) distributed by (distance_group_code)
    
  3. Execute the create_dim_tables.sql script. The psql \i command executes a script:

    $ psql -U user1 tutorial

    tutorial=# \i create_dim_tables.sql
    
  4. List the tables that were created, using the psql \dt command.

    tutorial=# \dt

  5. Exit the psql shell:

    tutorial=# \q

Project maintained by greenplum-db · Hosted on GitHub Pages — Theme by mattgraham