Auditing

Auditing

Greenplum Database is capable of auditing a variety of events, including startup and shutdown of the system, segment database failures, SQL statements that result in an error, and all connection attempts and disconnections. Greenplum Database also logs SQL statements and information regarding SQL statements, and can be configured in a variety of ways to record audit information with more or less detail. The log_error_verbosity configuration parameter controls the amount of detail written in the server log for each message that is logged.  Similarly, the log_min_error_statement parameter allows administrators to configure the level of detail recorded specifically for SQL statements, and the log_statement parameter determines the kind of SQL statements that are audited. Greenplum Database records the username for all auditable events, when the event is initiated by a subject outside the Greenplum Database.

Greenplum Database prevents unauthorized modification and deletion of audit records by only allowing administrators with an appropriate role to perform any operations on log files.  Logs are stored in a proprietary format using comma-separated values (CSV).  Each segment and the master stores its own log files, although these can be accessed remotely by an administrator.  Greenplum Database also authorizes overwriting of old log files via the log_truncate_on_rotation parameter.  This is a local parameter and must be set on each segment and master configuration file.

Greenplum provides an administrative schema called gp_toolkit that you can use to query log files, as well as system catalogs and operating enviroment for system status information. For more information, including usage, refer to The gp_tookit Administrative Schema appendix in the Greenplum Database Reference Guide.

Viewing the Database Server Log Files

Every database instance in Greenplum Database (master and segments) is a running PostgreSQL database server with its own server log file. Daily log files are created in the pg_log directory of the master and each segment data directory.

The server log files are written in comma-separated values (CSV) format. Not all log entries will have values for all of the log fields. For example, only log entries associated with a query worker process will have the slice_id populated. Related log entries of a particular query can be identified by its session identifier (gp_session_id) and command identifier (gp_command_count).

#

Field Name

Data Type

Description

1

event_time

timestamp with time zone

Time that the log entry was written to the log

2

user_name

varchar(100)

The database user name

3

database_name

varchar(100)

The database name

4

process_id

varchar(10)

The system process id (prefixed with "p")

5

thread_id

varchar(50)

The thread count (prefixed with "th")

6

remote_host

varchar(100)

On the master, the hostname/address of the client machine. On the segment, the hostname/address of the master.

7

remote_port

varchar(10)

The segment or master port number

8

session_start_time

timestamp with time zone

Time session connection was opened

9

transaction_id

int

Top-level transaction ID on the master. This ID is the parent of any subtransactions.

10

gp_session_id

text

Session identifier number (prefixed with "con")

11

gp_command_count

text

The command number within a session (prefixed with "cmd")

12

gp_segment

text

The segment content identifier (prefixed with "seg" for primaries or "mir" for mirrors). The master always has a content id of -1.

13

slice_id

text

The slice id (portion of the query plan being executed)

14

distr_tranx_id

text

Distributed transaction ID

15

local_tranx_id

text

Local transaction ID

16

sub_tranx_id

text

Subtransaction ID

17

event_severity

varchar(10)

Values include: LOG, ERROR, FATAL, PANIC, DEBUG1, DEBUG2

18

sql_state_code

varchar(10)

SQL state code associated with the log message

19

event_message

text

Log or error message text

20

event_detail

text

Detail message text associated with an error or warning message

21

event_hint

text

Hint message text associated with an error or warning message

22

internal_query

text

The internally-generated query text

23

internal_query_pos

int

The cursor index into the internally-generated query text

24

event_context

text

The context in which this message gets generated

25

debug_query_string

text

User-supplied query string with full detail for debugging. This string can be modified for internal use.

26

error_cursor_pos

int

The cursor index into the query string

27

func_name

text

The function in which this message is generated

28

file_name

text

The internal code file where the message originated

29

file_line

int

The line of the code file where the message originated

30

stack_trace

text

Stack trace text associated with this message

Greenplum provides a utility called gplogfilter that can be used to search through a Greenplum Database log file for entries matching the specified criteria. By default, this utility searches through the Greenplum master log file in the default logging location. For example, to display the last three lines of the master log file:
$ gplogfilter -n 3
You can also use gplogfilter to search through all segment log files at once by running it through the gpssh utility. For example, to display the last three lines of each segment log file:
$ gpssh -f seg_host_file
  => source /usr/local/greenplum-db/greenplum_path.sh
  => gplogfilter -n 3 /gpdata/gp*/pg_log/gpdb*.csv

The following are the Greenplum security-related audit (or logging) server configuration parameters that are set in the postgresql.conf configuration file:

Field Name

Value Range

Default

Description

log_connections

Boolean

off

This outputs a line to the server log detailing each successful connection. Some client programs, like psql, attempt to connect twice while determining if a password is required, so duplicate “connection received” messages do not always indicate a problem.

log_disconnections

Boolean

off

This outputs a line in the server log at termination of a client session, and includes the duration of the session.

log_statement

NONE

DDL

MOD

ALL

ALL

Controls which SQL statements are logged. DDL logs all data definition commands like CREATE, ALTER, and DROP commands. MOD logs all DDL statements, plus INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type.

log_hostname

Boolean

off

By default, connection log messages only show the IP address of the connecting host. Turning on this option causes logging of the host name as well. Note that depending on your host name resolution setup this might impose a non-negligible performance penalty.

log_duration

Boolean

off

Causes the duration of every completed statement which satisfies log_statement to be logged.

log_error_verbosity

TERSE

DEFAULT

VERBOSE

DEFAULT

Controls the amount of detail written in the server log for each message that is logged.

log_min_duration_statement

number of milliseconds, 0, -1

-1

Logs the statement and its duration on a single log line if its duration is greater than or equal to the specified number of milliseconds. Setting this to 0 will print all statements and their durations. -1 disables the feature. For example, if you set it to 250 then all SQL statements that run 250ms or longer will be logged. Enabling this option can be useful in tracking down unoptimized queries in your applications.

log_min_messages

DEBUG5

DEBUG4

DEBUG3

DEBUG2

DEBUG1

INFO

NOTICE

WARNING

ERROR

LOG

FATAL

PANIC

NOTICE

Controls which message levels are written to the server log. Each level includes all the levels that follow it. The later the level, the fewer messages are sent to the log.

log_rotation_age

Any valid time expression (number and unit)

1d

Determines the maximum lifetime of an individual log file. After this time has elapsed, a new log file will be created. Set to zero to disable time-based creation of new log files.

log_statement_stats

Boolean

off

For each query, write total performance statistics of the query parser, planner, and executor to the server log. This is a crude profiling instrument.

log_truncate_on_rotation

Boolean

off

Truncates (overwrites), rather than appends to, any existing log file of the same name. Truncation will occur only when a new file is being opened due to time-based rotation. For example, using this setting in combination with a log_filename such as gpseg#-%H.log would result in generating twenty-four hourly log files and then cyclically overwriting them. When off, pre-existing files will be appended to in all cases.