Greenplum PL/Perl Language Extension

Greenplum PL/Perl Language Extension

About Greenplum PL/Perl

With the Greenplum Database PL/Perl extension, you can write user-defined functions in Perl that take advantage of its advanced string manipulation operators and functions. PL/Perl provides both trusted and untrusted variants of the language.

PL/Perl is embedded in your Greenplum Database distribution. Greenplum Database PL/Perl requires Perl to be installed on the system of each database host.

Refer to the PostgreSQL PL/Perl documentation for additional information.

Greenplum Database PL/Perl Limitations

Limitations of the Greenplum Database PL/Perl language include:
  • Greenplum Database does not support PL/Perl triggers.
  • PL/Perl functions cannot call each other directly.
  • SPI is not yet fully implemented.
  • If you fetch very large data sets using spi_exec_query(), you should be aware that these will all go into memory. You can avoid this problem by using spi_query()/spi_fetchrow(). A similar problem occurs if a set-returning function passes a large set of rows back to Greenplum Database via a return statement. Use return_next for each row returned to avoid this problem.
  • When a session ends normally, not due to a fatal error, PL/Perl executes any END blocks that you have defined. No other actions are currently performed. (File handles are not automatically flushed and objects are not automatically destroyed.)

Trusted/Untrusted Language

PL/Perl includes trusted and untrusted language variants.

The PL/Perl trusted language is named plperl. The trusted PL/Perl language restricts file system operations, as well as require, use, and other statements that could potentially interact with the operating system or database server process. With these restrictions in place, any Greenplum Database user can create and execute functions in the trusted plperl language.

The PL/Perl untrusted language is named plperlu. You cannot restrict the operation of functions you create with the plperlu untrusted language. Only database superusers have privileges to create untrusted PL/Perl user-defined functions. And only database superusers and other database users that are explicitly granted the permissions can execute untrusted PL/Perl user-defined functions.

PL/Perl has limitations with respect to communication between interpreters and the number of interpreters running in a single process. Refer to the PostgreSQL Trusted and Untrusted PL/Perl documentation for additional information.

Enabling and Removing PL/Perl Support

You must register the PL/Perl language with a database before you can create and execute a PL/Perl user-defined function within that database. To remove PL/Perl support, you must explicitly remove the extension from each database in which it was registered. You must be a database superuser or owner to register or remove trusted languages in Greenplum databases.

Note: Only database superusers may register or remove support for the untrusted PL/Perl language plperlu.
Before you enable or remove PL/Perl support in a database, ensure that:
  • Your Greenplum Database is running.
  • You have sourced greenplum_path.sh.
  • You have set the $MASTER_DATA_DIRECTORY and $GPHOME environment variables.

Enabling PL/Perl Support

For each database in which you want to enable PL/Perl, register the language using the SQL CREATE LANGUAGE command or the Greenplum Database createlang utility. For example, run the following command as the gpadmin user to register the trusted PL/Perl language for the database named testdb:

$ createlang plperl -d testdb

Removing PL/Perl Support

To remove support for PL/Perl from a database, run the SQL DROP LANGUAGE command or the Greenplum Database droplang utility. For example, run the following command as the gpadmin user to remove support for the trusted PL/Perl language from the database named testdb:

$ psql -d testdb
psql (8.3.23)
Type "help" for help.

testdb=# DROP LANGUAGE plperl;

The default DROP LANGUAGE behavior is RESTRICT; the command fails if any existing objects (such as functions) depend on the language. Specify the CASCADE clause to also drop all dependent objects, including functions that you created with PL/Perl.

Developing Functions with PL/Perl

You define a PL/Perl function using the standard SQL CREATE FUNCTION syntax. The body of a PL/Perl user-defined function is ordinary Perl code. The PL/Perl interpreter wraps this code inside a Perl subroutine.

You can also create an anonymous code block with PL/Perl. An anonymous code block, called with the SQL DO command, receives no arguments, and whatever value it might return is discarded. Otherwise, a PL/Perl anonymous code block behaves just like a function. Only database superusers create an anonymous code block with the untrusted plperlu language.

The syntax of the CREATE FUNCTION command requires that you write the PL/Perl function body as a string constant. While it is more convenient to use dollar-quoting, you can choose to use escape string syntax (E'') provided that you double any single quote marks and backslashes used in the body of the function.

PL/Perl arguments and results are handled as they are in Perl. Arguments you pass in to a PL/Perl function are accessed via the @_ array. You return a result value with the return statement, or as the last expression evaluated in the function. A PL/Perl function cannot directly return a non-scalar type because you call it in a scalar context. You can return non-scalar types such as arrays, records, and sets in a PL/Perl function by returning a reference.

PL/Perl treats null argument values as "undefined". Adding the STRICT keyword to the LANGUAGE subclause instructs Greenplum Database to immediately return null when any of the input arguments are null. When created as STRICT, the function itself need not perform null checks.

The following PL/Perl function utilizes the STRICT keyword to return the greater of two integers, or null if any of the inputs are null:

CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
    if ($_[0] > $_[1]) { return $_[0]; }
    return $_[1];
$$ LANGUAGE plperl STRICT;

SELECT perl_max( 1, 3 );
 perl_max
----------
        3
(1 row)

SELECT perl_max( 1, null );
 perl_max
----------

(1 row)

PL/Perl considers anything in a function argument that is not a reference to be a string, the standard Greenplum Database external text representation. The argument values supplied to a PL/Perl function are simply the input arguments converted to text form (just as if they had been displayed by a SELECT statement). In cases where the function argument is not an ordinary numeric or text type, you must convert the Greenplum Database type to a form that is more usable by Perl. Conversely, the return and return_next statements accept any string that is an acceptable input format for the function's declared return type.

Refer to the PostgreSQL PL/Perl Functions and Arguments documentation for additional information, including composite type and result set manipulation.

Built-in PL/Perl Functions

PL/Perl includes built-in functions to access the database, including those to prepare and perform queries and manipulate query results. The language also includes utility functions for error logging and string manipulation.

The following example creates a simple table with an integer and a text column. It creates a PL/Perl user-defined function that takes an input string argument and invokes the spi_exec_query() built-in function to select all columns and rows of the table. The function returns all rows in the query results where the v column includes the function input string.

CREATE TABLE test (
    i int,
    v varchar
);
INSERT INTO test (i, v) VALUES (1, 'first line');
INSERT INTO test (i, v) VALUES (2, 'line2');
INSERT INTO test (i, v) VALUES (3, '3rd line');
INSERT INTO test (i, v) VALUES (4, 'different');

CREATE OR REPLACE FUNCTION return_match(varchar) RETURNS SETOF test AS $$
    # store the input argument
    $ss = $_[0];

    # run the query
    my $rv = spi_exec_query('select i, v from test;');

    # retrieve the query status
    my $status = $rv->{status};

    # retrieve the number of rows returned in the query
    my $nrows = $rv->{processed};

    # loop through all rows, comparing column v value with input argument
    foreach my $rn (0 .. $nrows - 1) {
        my $row = $rv->{rows}[$rn];
        my $textstr = $row->{v};
        if( index($textstr, $ss) != -1 ) {
            # match!  return the row.
            return_next($row);
        }
    }
    return undef;
$$ LANGUAGE plperl;

SELECT return_match( 'iff' );
 return_match
---------------
 (4,different)
(1 row)

Refer to the PostgreSQL PL/Perl Built-in Functions documentation for a detailed discussion of available functions.

Global Values in PL/Perl

You can use the global hash map %_SHARED to share data, including code references, between PL/Perl function calls for the lifetime of the current session.

The following example uses %_SHARED to share data between the user-defined set_var() and get_var() PL/Perl functions:

CREATE OR REPLACE FUNCTION set_var(name text, val text) RETURNS text AS $$
    if ($_SHARED{$_[0]} = $_[1]) {
        return 'ok';
    } else {
        return "cannot set shared variable $_[0] to $_[1]";
    }
$$ LANGUAGE plperl;

CREATE OR REPLACE FUNCTION get_var(name text) RETURNS text AS $$
    return $_SHARED{$_[0]};
$$ LANGUAGE plperl;

SELECT set_var('key1', 'value1');
 set_var
---------
 ok
(1 row)

SELECT get_var('key1');
 get_var
---------
 value1
(1 row)

For security reasons, PL/Perl creates a separate Perl interpreter for each role. This prevents accidental or malicious interference by one user with the behavior of another user's PL/Perl functions. Each such interpreter retains its own value of the %_SHARED variable and other global state. Two PL/Perl functions share the same value of %_SHARED if and only if they are executed by the same SQL role.

There are situations where you must take explicit steps to ensure that PL/Perl functions can share data in %_SHARED. For example, if an application executes under multiple SQL roles (via SECURITY DEFINER functions, use of SET ROLE, etc.) in a single session, make sure that functions that need to communicate are owned by the same user, and mark these functions as SECURITY DEFINER.

Notes

Additional considerations when developing PL/Perl functions:
  • PL/Perl internally utilizes the UTF-8 encoding. It converts any arguments provided in other encodings to UTF-8, and converts return values from UTF-8 back to the original encoding.
  • Nesting named PL/Perl subroutines retains the same dangers as in Perl.
  • Only the untrusted PL/Perl language variant supports module import. Use plperlu with care.
  • Any module that you use in a plperlu function must be available from the same location on all Greenplum Database hosts.