As engineers on the Greenplum Release Engineering team, we recently had the opportunity to do an in-depth exploration of Postgres’ build system. Greenplum Server is based on Postgres and has inherited the upstream build system. Our team was working on producing a relocatable build of Greenplum Server which led us to looking in to how to do relocatable builds of Postgres.
Why were we looking at creating a relocatable build of Greenplum Server? The first beta version of gpupgrade was recently released. This utility allows users to upgrade their Greenplum cluster from one major version to the next, in-place. In order for the gpupgrade
utility to successfully upgrade a cluster, it needs execute code from both major versions, potentially at the same time. This means that the binaries and libraries from the different Greenplum installations need to be able to load their associated shared-objects at runtime without accidentally loading a shared object from the other version. And they need to be able to do this independently of where the Greenplum software has been installed on disk.
Rather than getting bogged down in the particulars of our team’s use case, let us consider the following hypothetical scenario that highlights the key ideas and findings.
Building Postgres 11.8
Our story begins with Cardenia, a graduate student in a research lab at university. She would like to use Postgres to help with her data analysis but she doesn’t have root access on the lab’s shared infrastructure. Not wanting to bother the lab’s resident gray-beard, she decides that she’ll build, install, and run Postgres in her home directory. Her data analysis makes use of some old Perl modules so she makes sure to configure Postgres to build with PL/Perl.
mkdir -p ~/.local/src cd ~/.local/src curl https://ftp.postgresql.org/pub/source/v11.8/postgresql-11.8.tar.bz2 | \ tar -xj cd ./postgresql-11.8 ./configure \ --prefix=${HOME}/.local/postgres \ --with-perl make -j4 make install
Success! She’s built and installed Postgres — she thinks. She decides to run a quick sanity check:
[cardenia@hub postgresql-11.8]$ cd ~ [cardenia@hub ~]$ export PATH="${HOME}/.local/postgres/bin:${PATH}" [cardenia@hub ~]$ initdb ~/test ... [cardenia@hub ~]$ pg_ctl -D ~/test -l logfile start waiting for server to start.... done server started [cardenia@hub ~]$ createdb [cardenia@hub ~]$ psql -c 'SELECT version()' version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit (1 row)
It looks like the core server is up and running but she also wants to make sure that PL/Perl will work too; she launches psql
and creates a PL/Perl function:
-- taken from https://www.postgresql.org/docs/11/plperl-funcs.html CREATE EXTENSION plperl; CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$ if ($_[0] > $_[1]) { return $_[0]; } return $_[1]; $$ LANGUAGE plperl;
She tests that PL/Perl is working by calling the function:
cardenia=# SELECT perl_max(1, 2); perl_max ---------- 2 (1 row)
It works!
Relocate Postgres 11: Failed Attempt
After Cardenia presents her progress at her labs weekly research review, her fellow student Marce decides that he wants to use Postgres and PL/Perl to help with his data analysis. Marce is not as comfortable with Linux as she is, so Cardenia helps him out by archiving her installation directory and sharing it with him:
[cardenia@hub ~]$ tar -czf /tmp/postgres11.8.binary.tar.gz -C ~/.local/postgres .
Marce makes a directory for his installation and extracts the archive from Cardenia and follows Cardenia’s instruction to start the new Postgres instance:
[marce@hub ~]$ mkdir -p ~/.local/postgres [marce@hub ~]$ tar -xzf /tmp/postgres11.8.binary.tar.gz -C ~/.local/postgres [marce@hub ~]$ export PATH="${HOME}/.local/postgres/bin:${PATH}" [marce@hub ~]$ initdb ~/test initdb: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory
Marce reports the error back to Cardenia and she joins Marce in debugging the issue.
Cardenia knows that the reusable parts of an application are packaged in to libraries which are also known as shared objects. These shared objects need to be located when running a binary that links against them and this is the job of ld-linux.so
Cardenia begins her investigation with ldd
, which will (recursively) print all of the shared object dependencies and where the dynamic linker has located them.
[marce@hub ~]$ ldd ~/.local/postgres/bin/initdb linux-vdso.so.1 => (0x00007ffede149000) libpq.so.5 => not found libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4dd4a0b000) librt.so.1 => /lib64/librt.so.1 (0x00007f4dd4803000) libc.so.6 => /lib64/libc.so.6 (0x00007f4dd4435000) /lib64/ld-linux-x86-64.so.2 (0x00007f4dd4c27000)
NOTE: Running
ldd
on untrusted binaries is dangerous since it executes the program being probed — see ldd arbitrary code execution for a proof-of-concept exploit. When working with binaries of an unknown origin, a better alternative may bereadelf
from thebinutils
package.
The output of ldd
confirms what Cardenia already knows — the dynamic linker is unable to locate libpq.so.5
She checks to make sure that libpq.so.5
has not been accidentally deleted:
[marce@hub ~]$ ls -l ~/.local/postgres/lib/libpq.so* lrwxrwxrwx 1 marce marce 13 Aug 4 02:25 /home/marce/.local/postgres/lib/libpq.so -> libpq.so.5.11 lrwxrwxrwx 1 marce marce 13 Aug 4 02:25 /home/marce/.local/postgres/lib/libpq.so.5 -> libpq.so.5.11 -rwxr-xr-x 1 marce marce 298384 Aug 4 02:25 /home/marce/.local/postgres/lib/libpq.so.5.11
Carenia decides to read the manual for the dynamic linker (man 8 ld-linux.so
). She learns thats the linker will look for a shared object in a cache of candidate objects (/etc/ld.so.cache) and that this cache managed by ldconfig
. Unfortunately, Cardenia still doesn’t have root access and cannot add ~cardenia/.local/postgres-11/lib as a location to search for candidate shared objects. Even if she could, this configuration would apply to the whole OS and not just her user.
She also sees that she can specify a list of directories to search for shared objects with the LD_LIBRARY_PATH
but this setting would apply to all programs she runs in that environment. The manual for the dynamic linker mentioned DR_RPATH
and DT_RUNPATH
dynamic section attributes of binaries; if the executable being loaded contains these attributes, the linker will search the given directories for required shared objects. It sounds like Cardenia should be able to (re-)build her binaries so that they contain a reference to the location of their required shared-object. She suspects that this is the reason why initdb
worked before she shared the tarball with Marce. She uses readelf
to check if initdb
has RUNPATH
set and if so, what it is set to:
[marce@hub ~]$ readelf --dynamic ~/.local/postgres/bin/initdb ... 0x000000000000001d (RUNPATH) Library runpath: [/home/cardenia/.local/postgres/lib] ...
Even though the path in the RUNPATH
entry is correct and points to a directory container libpq.so.5
, this directory is not readable by other users.
[marce@hub ~]$ ls -l /home/cardenia/.local/postgres/lib ls: cannot access /home/cardenia/.local/postgres/lib: Permission denied
Cardenia sees that the original build includes an absolute path in the DT_RUNPATH
attribute. She could have Marce follow the same steps that she did in order for him to build binaries that contain an absolute path that points to locations inside his home directory. That means if Cardenia updates her Postgres installation and wants to share it with Marce, he will have to rebuild as well. She would really prefer to build it once and then share an archive with Marce.
Cardenia also read about $ORIGIN
in the manual for the dynamic linker. If her understanding is correct, that should allow her to use a relative path for RUNPATH
. Plus, this seems like a good enough reason to put off starting her research proposal — this counts as work, right?
Cardenia returns to her desk to try and build relocatable Postgres binaries.
Making Postgres Relocatable – The First Attempt
“Surely I am not the first person to want to build a relocatable Postgres installation,” Cardenia thinks. One internet search later and she finds Relocatable PostgreSQL Builds that looks to do exactly what she wants. Since she wants to make sure the installation is relocatable, she still sets --prefix
point to her home directory.
cd ~/.local/src/postgresql-11.8/ make clean ./configure \ --prefix=/home/cardenia/.local/postgres \ --with-perl \ --disable-rpath LD_RUN_PATH='$ORIGIN/../lib' make -j4 make install
The default setting for Postgres’ build system is to embed shared library search path in executables. Since Cardenia wants to control the embedded shared library search path with the linker ld
, she explicitly disables that functionality with --disable-rpath
. This default behavior is why her original build already included an embedded search path.
Now that it is built and installed, Cardenia builds a tarball again, and gives it to Marce
[cardenia@hub postgresql-11.8]$ tar -czf /tmp/postgres11.8.relocatable.binary.tar.gz -C ~/.local/postgres .
Marce unpacks the new build of Postgres
[marce@hub ~]$ tar -xzf /tmp/postgres11.8.relocatable.binary.tar.gz -C ~/.local/postgres
and checks if the dynamic linker can find all of initdb
‘s dependencies:
[marce@hub ~]$ ldd ~/.local/postgres/bin/initdb linux-vdso.so.1 => (0x00007ffc94185000) libpq.so.5 => /home/marce/.local/postgres/bin/../lib/libpq.so.5 (0x00007f4d75d86000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4d75b6a000) librt.so.1 => /lib64/librt.so.1 (0x00007f4d75962000) libc.so.6 => /lib64/libc.so.6 (0x00007f4d75594000) /lib64/ld-linux-x86-64.so.2 (0x00007f4d75fc9000)
This looks promising. Time for Marce to test his database.
[marce@hub ~]$ export PATH="${HOME}/.local/postgres/bin:${PATH}" [marce@hub ~]$ mkdir -p ~/test [marce@hub ~]$ initdb ~/test [marce@hub ~]$ pg_ctl -D ~/test -l logfile start waiting for server to start.... done server started [marce@hub ~]$ createdb [marce@hub ~]$ psql -c 'SELECT version()' version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit (1 row)
Now Marce tries to follow the PL/Perl function example as Cardenia demoed:
[marce@hub ~]$ psql -c 'CREATE EXTENSION plperl' ERROR: could not load library "/home/marce/.local/postgres/lib/plperl.so": libperl.so: cannot open shared object file: No such file or directory
Not again?!? Marce calls Cardenia over for some more pair-debugging. Cardenia breaks out her trusty tool ldd
.
[marce@hub ~]$ ldd ~/.local/postgres/lib/plperl.so linux-vdso.so.1 => (0x00007fffffd91000) libperl.so => not found libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f09e0289000) libc.so.6 => /lib64/libc.so.6 (0x00007f09dfebb000) /lib64/ld-linux-x86-64.so.2 (0x00007f09e06bb000)
This was working before, so libperl.so
must be on the system and Cardenia hasn’t made any changes to /etc/ld.co.cache. Why can the linker not find the required library? The lab’s main server (hub
) is running CentOS 7 and it turns out that on RHEL7 platforms, libperl.so
is in a location that is not searched by ld-linux.so
by default.
[marce@hub ~]$ find /usr -name 'libperl.so' /usr/lib64/perl5/CORE/libperl.so [marce@hub ~]$ readelf -d ~/.local/postgres/lib/plperl.so ... 0x000000000000000f (RPATH) Library rpath: [$ORIGIN/../lib] ...
Cardenia cannot add this path to /etc/ld.so.cache and using a single value for RUNPATH
for all binaries obviously isn’t going to work.
What Cardenia really wants is to set the value of RUNPATH
indiviudally. She returns to her desk to debug further.
A Dive in to Postgres’ Build System
Let’s take a look at how RUNPATH
is handled in Postgres’ build system and see if we can come up with a solution for Cardenia and Marce.
Configure
The configure option --enable-rpath
comes from configure.in#L177-L182
# # '-rpath'-like feature can be disabled # PGAC_ARG_BOOL(enable, rpath, yes, [do not embed shared library search path in executables]) AC_SUBST(enable_rpath)
PGAC_ARG_BOOL
is defined in general.m4#L77-L102 and defines an option that takes eitheryes
orno
- this macro is used for both
--enable-<option>
and--with-<option>
options
- this macro is used for both
AC_SUBST
makes an output variable from a shell variable ($enable_rpath
) which is expanded in files (@enable_rpath@
)- in this case, the value it expands to will be either
yes
orno
Makefile
$(enable_rpath)
The only file with a reference to @enable_rpath@
is src/Makefile.global.in#L199
enable_rpath = @enable_rpath@
which is setting a Make variable equal to the value of the enable_rpath
shell variable from the configure script; for example
[cardenia@hub postgresql-11.8]$ ./configure --enable-rpath ... [cardenia@hub postgresql-11.8]$ grep '^enable_rpath' src/Makefile.global enable_rpath = yes [cardenia@hub postgresql-11.8]$ ./configure --disable-rpath ... [cardenia@hub postgresql-11.8]$ grep '^enable_rpath' src/Makefile.global enable_rpath = no
The $(enable_rpath)
Make variable is referenced in two locations
The usage in src/Makefile.shlib only applies to HP-UX and we can ignore it since we are focusing on Linux. The usage in src.Makefile.global.in is what we need to understand; if $(enable_rpath)
is yes
, then we add some linking flags to LDFLAGS
.
# Set up rpath if enabled. By default it will point to our libdir, # but individual Makefiles can force other rpath paths if needed. rpathdir = $(libdir) ifeq ($(enable_rpath), yes) LDFLAGS += $(rpath) endif
$(rpath)
We saw in the previous section that if $(enable_rpath)
is set to yes, then the contents of the $(rpath)
-Make variable are appended to the LDFLAGS
variable. We can find all the locations where this variable is assigned:
$ grep --recursive '^rpath =' src/backend/utils/mb/conversion_procs/proc.mk:rpath = src/backend/snowball/Makefile:rpath = src/pl/plpgsql/src/Makefile:rpath = src/makefiles/Makefile.freebsd:rpath = -Wl,-R'$(rpathdir)' src/makefiles/Makefile.openbsd:rpath = -Wl,-R'$(rpathdir)' src/makefiles/Makefile.netbsd:rpath = -Wl,-R'$(rpathdir)' src/makefiles/Makefile.netbsd:rpath = -Wl,-R'$(rpathdir)' src/makefiles/Makefile.solaris:rpath = -Wl,-rpath,'$(rpathdir)' src/makefiles/Makefile.solaris:rpath = -Wl,-R'$(rpathdir)' src/makefiles/Makefile.linux:rpath = -Wl,-rpath,'$(rpathdir)',--enable-new-dtags
We observe
- some parts of the build system explicitly blank out the value of
$(rpath)
- the default value is set via platform-specific Makefiles (e.g., Makefile.netbsd)
- in most cases, the actual value of
RPATH/RUNPATH
is set to the value of$(rpathdir)
$(rpathdir)
We saw in the previous section that the actual value of RPATH/RUNPATH
is set via the Make variable $(rpathdir)
and we can easily find all locations where it is assigned:
$ grep --recursive '^rpathdir =' contrib/jsonb_plperl/Makefile:rpathdir = $(perl_archlibexp)/CORE contrib/hstore_plpython/Makefile:rpathdir = $(python_libdir) contrib/jsonb_plpython/Makefile:rpathdir = $(python_libdir) contrib/ltree_plpython/Makefile:rpathdir = $(python_libdir) contrib/hstore_plperl/Makefile:rpathdir = $(perl_archlibexp)/CORE src/Makefile.global.in:rpathdir = $(libdir) src/pl/plperl/GNUmakefile:rpathdir = $(perl_archlibexp)/CORE src/pl/plpython/Makefile:rpathdir = $(python_libdir)
We observe:
- the default value for
$(rpathdir)
is$(libdir)
- different parts of the build system override the default by assigning
$(rpathdir)
to a custom value
For example, the Makefile for PL/Perl explicitly overrides the default RUNPATH
value with a path obtained from the Perl runtime:
rpathdir = $(perl_archlibexp)/CORE
A Second Attempt at Making Postgres Relocatable
If we wanted to set a default RPATH/RUNPATH
to $ORIGIN/../lib
, we could accomplish this with the following small patch:
diff -urN postgresql-11.8.orig/src/Makefile.global.in postgresql-11.8/src/Makefile.global.in --- postgresql-11.8.orig/src/Makefile.global.in 2020-05-11 21:10:48.000000000 +0000 +++ postgresql-11.8/src/Makefile.global.in 2020-07-23 22:18:56.589659501 +0000 @@ -525,7 +525,7 @@ # Set up rpath if enabled. By default it will point to our libdir, # but individual Makefiles can force other rpath paths if needed. -rpathdir = $(libdir) +rpathdir = $$ORIGIN/../lib ifeq ($(enable_rpath), yes) LDFLAGS += $(rpath)
NOTE: This patch makes an assumption that $(bindir)
and $(libdir)
are located in a common parent directory.
The above approach does not modify the RPATH/RUNPATH
for parts of the system where it is explicitly overridden. This is an improvement to the first attempt, where LD_RUN_PATH
was used to override all the RPATH/RUNPATH
settings.
Equipped with our proposed patch and the insights we’ve shared with her, Cardenia patches and re-builds her Postgres 11 installation.
cd ~/.local/src/postgresql-11.8 make clean ./configure \ --prefix=${HOME}/.local/postgres \ --with-perl \ --enable-rpath make -j4 make install
Finally, Cardenia has a reloctable installation of Postgres 11 with PL/Perl properly configured. She gives it to Marce, and asks him to try again.
[cardenia@hub src]$ tar -czf /tmp/postgres11.8.final.tar.gz -C ~/.local/postgres .
Marce unpacks the installation and runs through the basic test steps.
[marce@hub ~]$ tar -xzf /tmp/postgres11.8.final.tar.gz -C ~/.local/postgres [marce@hub ~]$ export PATH="${HOME}/.local/postgres/bin:${PATH}" [marce@hub ~]$ pg_ctl -D ~/test -l logfile start waiting for server to start.... done server started [marce@hub ~]$ psql -c 'SELECT version()' version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit (1 row) [marce@hub ~]$ psql -c 'CREATE EXTENSION plperl' CREATE EXTENSION
Marce and Cardenia high-five one another to celebrate their success! With a better understanding of executables, shared objects, and Postgres’ build system Cardenia can share the Postgres 11 with Marce and other labmates through a simple archive.
Conclusion
Being able to build a customized, relocatable installation of Postgres might seem like an obvious and desirable feature for the build system to have. If we were able to accomplish this with a single line change, one might wonder why it hasn’t been done before.
Here are some of the important simplifying assumptions we made that allowed us to do this:
- We ignored non-Linux platforms like BSD or HP-UX
- We assumed that
$(bindir)
and$(libdir)
had a common parent but./configure
allows you to customize these paths independently - We didn’t try running any of the included test (e.g.,
installcheck
orinstallcheck-world
)
In fact adding this feature to Postgres’ build system has been brought up before.
Postgres Hackers Mailing list
We found one (1) thread on pgsql-hackers that submitted a patch for building with a relative RPATH/RUNPATH
. A couple of issues were identified with this approach:
check-world
tests would fail because test executables are not installed- its possible that this should be
installcheck-world
becausecheck-world
runs successfully with the patch above applied whileinstallcheck-world
does not - see the output of
readelf -d ~/.local/src/postgresql-11.8/src/test/isolation/isolationtester
, theRUNPATH
is also pointed to$ORIGIN/../lib
, but it is linked againstlibpq.so.5
but is not locatable under that directory. We need the absoluate path for theRUNPATH
, other than the relative path.
- its possible that this should be
- running built binaries from the source tree (instead of the install location) wouldn’t work
- the patch submitted to hackers makes the same assumption that our patch does — that binaries and libraries are located in directories (i.e.
$(bindir)
and$(libdir)
respectively) that share a common parent
- the patch submitted to hackers makes the same assumption that our patch does — that binaries and libraries are located in directories (i.e.
$ORIGIN/../lib
doesn’t work for libraries that are located in subdirectories of$(libdir)
- out-of-tree modules built with
pgxs
The discussion on this topic didn’t move past this point.