How to Play Python3 with GPDB6

We recently released GreenplumPython, a Python library that allows users to interact with Greenplum or PostgreSQL in a Pythonic way. GreenplumPython provides a pandas-like table API that is familiar and intuitive to Python users. GreenplumPython is making it powerful for performing complex analyses such as statistical analysis with UDFs and UDAs. It encapsulates common best practices and avoids common pitfalls in Greenplum compared to writing SQL directly.

However, for most users still using gpdb6 and plpython2, they will need to install plpython3. This article describes how to install plpython3 on gpdb6.

For commercial Greenplum users, plpython3 is shipped with the Greenplum Database package since 6.22.0. Please refer to PL/Python Language on how to use it.

Prerequisites

Greenplum Database (GPDB) is an open-source data warehousing and analytics platform based on PostgreSQL. It supports several procedural languages, including SQL, C, Perl, and Python, among others. It uses Python2 by default for the Python procedural language.

 

Follow this section to build and install plpython for your Greenplum server.

 

Before we start, make sure you have the following software components installed on your system:

  • GPDB6 source code: You can download and install the latest gpdb6 version of using git clone https://github.com/greenplum-db/gpdb.git then git checkout 6X_STABLE;
  • Python3: You should have Python 3 installed on your system and make sure it’s accessible from the PATH environment variable. Using python3 --version or python --version to check python version to make sure that python version higher than 3.9;
    Since there may be not have default python3.9, we need to install python3.9 using the following steps.

For centos7/8 users:

sudo yum install gcc openssl-devel bzip2-devel libffi-devel zlib-devel # requirements 
wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tgz
tar -xvf Python-3.9.13.tgz
cd Python-3.9.13
./configure --enable-optimizations
make && make install
python3.9 --version

For ubuntu users:

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.9-dev
python3.9 --version
  • You may need to install python-devel in your env;
  • GCC and Make: You will need to have the GCC compiler and the Make utility installed in order to compile the PL/Python3 source.

Build and install PL/Python3 with Greenplum Database 6

  1. Configure GPDB6 with –with-python option and set env PYTHON to the path of python3 executable.
    PYTHON=/path/to/python3 ./configure --with-perl --with-python --with-libxml --prefix=gpdb6_path
    
  2. make -j8
  3. make install

Install PL/Python3 to an existing Greenplum Database 6 cluster

# gpdb project should be configured with the same --prefix param.
cd gpdb_src/pl/plpython
source path_to_installed_greenplum/greenplum_path.sh
PYTHON=/path/to/python3 make -j8 && make install

Testing

Once you have compiled and installed PL/Python3, please follow these steps to use it in GPDB:

Connect to GPDB: Connect to GPDB as a superuser and navigate to the database where you want to install PL/Python3.

psql postgres
CREATE LANGUAGE plpython3u;

Then validate the installation: You can run a simple test to validate the installation of PL/Python3 by creating a function that returns the current Python version. For example:

psql postgres
CREATE FUNCTION test_plpython3_version() 
RETURNS text AS $$
  import sys
  return str(sys.version_info)
$$ language plpython3u;

select test_plpython3_version();

This should return the current Python version used by PL/Python3, and you have successfully installed and tested PL/Python3 with Greenplum Database 6.

Frequently Asked Questions

Q: ERROR: could not access file “$libdir/plpython3”: No such file or directory
A: Use the ‘–prefiix’ arguments in the ‘.configure’ command line, it specifies where ‘make install’ copies files to plpython3u requires following files to be installed in the $GPHOME

├── lib
│   └── postgresql
│       └── plpython3.so
└── share
    └── postgresql
        └── extension
            ├── plpython3u--1.0.sql
            ├── plpython3u.control
            └── plpython3u--unpackaged--1.0.sql

Q: Error: could not load library “/usr/local/greenplum-db-6.23.0/lib/postgresql/plpython3.so”: libpython3.9.so.1.0: cannot open shared object file: No such file or directory (dfmgr.c:240)
A: please find where libpython3.9.so locate then

export LD_LIBRARY_PATH="path_to_libpython3.9.so:$LD_LIBRARY_PATH"
export PATH="path_to_python3.9/bin:$PATH"

Q: I use pip installed some packages like numpy but in plpython3u it raise this error

postgres=# CREATE OR REPLACE FUNCTION test_import()
RETURNS text AS $$
  import sys
  import numpy
  return str(numpy.__file__)
$$ language plpython3u;

select test_import();

ERROR:  ModuleNotFoundError: No module named 'numpy' (plpy_elog.c:121)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "test_import", line 3, in <module>
    import numpy
PL/Python function "test_import"

A: You can use GUC to set custom location then ensure that you can import the package

python -m pip install --prefix=/home/gpadmin/my_python dill

$ psql -d testdb
testdb=# load 'plpython3';
testdb=# SET plpython3.python_path='/home/gpadmin/my_python';

Ensure that you configure plpython3.python_path before you create or call plpython3 functions in a session. If you set or change the parameter after plpython3u is initialized you receive the error:

ERROR: SET PYTHONPATH failed, the GUC value can only be changed before initializing the python interpreter.

To set a default value for the configuration parameter, use gpconfig instead:

gpconfig -c plpython3.python_path \
    -v "'/home/gpadmin/my_python'" \
    --skipvalidation
gpstop -u

Q: Can I use a different version of Python with PL/Python3 both python2 and python3?
A: Yes, plpython2 and plpython3 can co-exist, but they cannot be used in the same session.

Conclusion

In this article, we have shown you how to compile and install the PL/Python3 language extension for Greenplum Database 6. PL/Python3 is a powerful procedural language that allows you to write complex analytics functions in Python and execute them within the Greenplum Database, with plpython3 installed you can use GreenplumPython now.

If you encounter any other problems, you post a problem on GreenplumPython and we will get back to you as soon as possible.