Introduction to Writable External protocol of gpfdist
Gpfdist support both readable external table and writable external table. This blog will introduce how writable gpfdist external table works.
Purpose of writable external table is to offload data from GPDB to remote server in text or CSV format with optional compression in parallel from segment. All segments export its own data directly to external gpfdist server without passing them to master first.
Writable external table share similar HTTP infrastructure with readable external table. GPDB segment works as an HTTP client which posts data to the HTTP server (gpfdist). However, the communication protocol of writable external table is different with readable external table. It doesn’t finish everything within single HTTP connection but in a sequence of HTTP requests (minimal number is three). Writable external table only support Protocol version 0, which means the data is transferred in raw format and every metadata is sent within the header fields. Additional control information is sent through the initial and teardown connection.
HTTP header explanation
All header fields used by writable external table is explained in following tables.
|X-GP-LINE-DELIM-LENGTH||Request||N||length of line ending|
|X-GP-PROTO||Request / Response||Y||protocol version, only 0 for writable external table|
|X-GP-SEQ||Request||Y||Sequence number of data request|
|X-GP-DONE||Request||N||Indicates the full write operation is done on this segment|
|X-GP-DATABASE||Request||N||current database name|
Most fields have same meaning as readable external table. We just explain the most important parameter for writable external table here.
As mentioned earlier, each segment of GPDB would send at least 3 HTTP requests to finish a complete upload operation. There should be one initial request, several data requests and one teardown request. Each segment of GPDB to gpfdist use X-GP-SEQ to count the number of current connection. Each segment use its own separated sequence number.
This field indicates the end of entire upload operation. It stands for the teardown request. Gpfdist need to wait for all teardown message of each segments before it close the session.
HTTP requests type
There 3 kinds of HTTP requests: initial request, data request and teardown request. Gpfdist use HTTP 1.0 protocol so that each HTTP connection can only handle one request. Each query on writable external table will include one initial request, one or more data requests and one teardown request. Let’s explain them one by one now.
Each segment start uploading data with an initial request. Initial request post an empty data package (Content-Length equal to 0) with X-GP-SEQ set to 1. Gpfdist record this session and reply with an empty response. Initial request initialize the data transfer process for the segment. Below is the sample initial request.
POST /lineite_wtite HTTP/1.1 Host: 127.0.0.1:8080 Accept: */* X-GP-XID: 1502765779-0000000095 X-GP-CID: 0 X-GP-SN: 0 X-GP-SEGMENT-ID: 0 X-GP-SEGMENT-COUNT: 3 X-GP-LINE-DELIM-LENGTH: -1 X-GP-PROTO: 0 X-GP-SEQ: 1 Content-Type: text/xml Content-Length: 0 HTTP/1.0 200 ok Content-type: text/plain Expires: 0 X-GPFDIST-VERSION: 5.0.0 X-GP-PROTO: 0 Cache-Control: no-cache Connection: close
Data request is a POST request to same URL as in initial request. There might be more than one data requests if the size of content is bigger than writable_external_table_bufsize, which limit the max package size in each POST request. The data request contain an “Expect: 100-continue” header. If gpfdist server is ready to receive the data, it will then reply the “HTTP/1.1 100 Continue” response. After receiving the “continue” response from gpfdist, GPDB segment will start to POST the data. Example of data request is below:
POST /lineite_wtite HTTP/1.1 Host: 127.0.0.1:8080 Accept: */* X-GP-XID: 1502765779-0000000095 X-GP-CID: 0 X-GP-SN: 0 X-GP-SEGMENT-ID: 0 X-GP-SEGMENT-COUNT: 3 X-GP-LINE-DELIM-LENGTH: -1 X-GP-PROTO: 0 X-GP-SEQ: 606 Content-Type: text/xml Content-Length: 65507 Expect: 100-continue HTTP/1.1 100 Continue 33|605187|55200|2|32.00|34948.80|0.02|0.05|A|F|1993-12-09|1994-01-04|1993-12-28|COLLECT COD |MAIL |gular theodolites 33|1374686|99700|3|5.00|8803.10|0.05|0.03|A|F|1993-12-09|1993-12-25|1993-12-23|TAKE BACK RETURN |AIR |. stealthily bold exc 33|339175|39176|4|41.00|49780.56|0.09|0.00|R|F|1993-11-09|1994-01-24|1993-11-11|TAKE BACK RETURN |MAIL |unusual packages doubt caref ...
Teardown request is the final Post request after all data requests. It contain an empty HTTP data and the GP-DONE header with value equal to 1. When gpfdist receive teardown request, it will also reply with empty response and clean up the connection resource.
Following are the example of teardown request.
POST /lineite_wtite HTTP/1.1 Host: 127.0.0.1:8080 Accept: */* X-GP-XID: 1502765779-0000000095 X-GP-CID: 0 X-GP-SN: 0 X-GP-SEGMENT-ID: 0 X-GP-SEGMENT-COUNT: 3 X-GP-LINE-DELIM-LENGTH: -1 X-GP-PROTO: 0 X-GP-SEQ: 608 Content-Type: text/xml X-GP-DONE: 1 Content-Length: 0 HTTP/1.0 200 ok Content-type: text/plain Expires: 0 X-GPFDIST-VERSION: 5.0.0 X-GP-PROTO: 0 Cache-Control: no-cache Connection: close
How writable external table work
Similar as readable external table, gpfdist server group the requests from segments of Greenplum into sessions, according to their <transaction id, command id, scan count> parameters. Gpfdist will write the data belong to same session to same file.
Writable external use “Protocol 0” to exchange control message and data between Greenplum segment and gpfdist. Segment start its own writing process by an initial request following by a sequence of data requests. Segment write the data belong to itself through and when it finishes , it sends the teardown to indicate there is no more data from it. Following diagram illustrates the sequence of three types of requests for individual segment.
When all segments finish the data transfer, gpfdist will close the session and finish writing the target file.
GUC for writable external table
There are also some GUC parameter that control the behavior of writable external table.
This value control how many data should be sent in each Data request. This is the max buffer size. Enlarging this value would help to improve the transfer efficiency if the network is good.
By now, we introduce both readable external table and writable external table. It would be helpful to diagnose the transfer failure or design your new server. Besides, there are a lot of aspect to improve for current gpfdist protocol. For example, we could use compression for data transfer or use latest HTTP 2.0 protocol. As open source project (https://github.com/greenplum-db/gpdb), any comments and feedback are welcome.