Postgres, like many database applications, has a robust dynamic trace capability. Combined with a highly configurable log facility, it’s quite possible to track database activity. But as with most attempts at observability, it isn’t quite that simple.
Logs provide high-level views into a Postgres service, emitting details describing errors, connections, checkpoints, and duration. A trace capability allows an external utility to be called at specific points in the code, enabling execution trace. This is based on probes inserted into Postgres source code. The standard external utility supported is Dtrace. Supporting other dynamic tracing utilities is theoretically possible.
One of the first issues you encounter when combining Postgres probes with a trace capability is that the probes are not configured by default. There are several reasons for this, including tracing’s potential to impose a non-trivial performance penalty. It’s generally something that is enabled only in a development environment. From the Postgres docs: By default the probes are not compiled into PostgreSQL; the user needs to explicitly tell the configure script to make the probes available.
AppScope has the ability to track all SQL activity associated with a Postgres service. This type of observability is not intended as a development tool. It should be useful to administrators and SREs without the need to modify Postgres source code. The observability detail can be readily enabled and disabled, in dynamic fashion, by updating AppScope config. More on dynamic config in a different post.
It’s relatively easy to enable. The biggest limitation is that the database must be restarted to allow it to be “scoped.” We expect to alleviate this limitation in future releases.
The Postgres service is “scoped” by causing the libscope library to be loaded when the service starts. The example below uses the libscope library directly. It’s also possible to use the ldscope
application in lieu of a direct library load.
You can enable payload activity by using a config file, or by means of environment variables. The example below points to a config file, scope.yml
, in /etc/scope
.
$ sudo -u postgres LD_PRELOAD=/usr/lib/libscope.so SCOPE_HOME=/etc/scope /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf
Connect to the Postgres service and perform whatever activity is needed.
$ sudo -u postgres psql -h 127.0.0.1
Commands sent to the Postgres service will be emitted in the configured payload dir (/tmp
by default) with the filename suffix *.in
. The filename uses the pattern pid_localIP:port_remoteIP:port.in
. Where IP addresses are not available, asin some SSL instances, the filename pattern replaces IP:port
with tlsrx
or netrx
to indicate how the payload was extracted.
Given a binary data format, it’s convenient to use hexdump to observe the detail.
$ hd -v 27270_tlsrx:0_tlsrx:0.in
00000000 51 00 00 00 20 53 45 4c 45 43 54 20 2a 20 46 52 |
|Q... SELECT * FR| |
00000010 4f 4d 20 70 67 5f 6e 61 6d 65 73 70 61 63 65 3b |
|OM pg_namespace;| |
00000020 00 |
|.| |
Responses sent from a Postgres service are emitted as *.out
files. As described above, they are written to the configured payload dir, and utilize the same filename pattern.
$ hd -v 27270_tlstx:0_tlstx:0.out
00000000 54 00 00 00 54 00 03 6e 73 70 6e 61 6d 65 00 00 |
|T...T..nspname..| |
00000010 00 0a 37 00 01 00 00 00 13 00 40 ff ff ff ff 00 |
|..7.......@.....| |
00000020 00 6e 73 70 6f 77 6e 65 72 00 00 00 0a 37 00 02 |
|.nspowner....7..| |
00000030 00 00 00 1a 00 04 ff ff ff ff 00 00 6e 73 70 61 |
|............nspa| |
00000040 63 6c 00 00 00 0a 37 00 03 00 00 04 0a ff ff ff |
|cl....7.........| |
00000050 ff ff ff 00 00 44 00 00 00 1c 00 03 00 00 00 08 |
|.....D..........| |
00000060 70 67 5f 74 6f 61 73 74 00 00 00 02 31 30 ff ff |
|pg_toast....10..| |
00000070 ff ff 44 00 00 00 1d 00 03 00 00 00 09 70 67 5f |
|..D..........pg_| |
00000080 74 65 6d 70 5f 31 00 00 00 02 31 30 ff ff ff ff |
|temp_1....10....| |
00000090 44 00 00 00 23 00 03 00 00 00 0f 70 67 5f 74 6f |
|D...#......pg_t| |
000000a0 61 73 74 5f 74 65 6d 70 5f 31 00 00 00 02 31 30 |
|ast_temp_1....10| |
000000b0 ff ff ff ff 44 00 00 00 40 00 03 00 00 00 0a 70 |
|....D...@......p| |
000000c0 67 5f 63 61 74 61 6c 6f 67 00 00 00 02 31 30 00 |
|g_catalog....10.| |
000000d0 00 00 22 7b 70 6f 73 74 67 72 65 73 3d 55 43 2f |
|.."{postgres=UC/| |
000000e0 70 6f 73 74 67 72 65 73 2c 3d 55 2f 70 6f 73 74 |
|postgres,=U/post| |
000000f0 67 72 65 73 7d 44 00 00 00 3d 00 03 00 00 00 06 |
|gres}D...=......| |
00000100 70 75 62 6c 69 63 00 00 00 02 31 30 00 00 00 23 |
|public....10...#| |
00000110 7b 70 6f 73 74 67 72 65 73 3d 55 43 2f 70 6f 73 |
|{postgres=UC/pos| |
00000120 74 67 72 65 73 2c 3d 55 43 2f 70 6f 73 74 67 72 |
|tgres,=UC/postgr| |
00000130 65 73 7d 44 00 00 00 48 00 03 00 00 00 12 69 6e |
|es}D...H......in| |
00000140 66 6f 72 6d 61 74 69 6f 6e 5f 73 63 68 65 6d 61 |
|formation_schema| |
00000150 00 00 00 02 31 30 00 00 00 22 7b 70 6f 73 74 67 |
|....10..."{postg| |
00000160 72 65 73 3d 55 43 2f 70 6f 73 74 67 72 65 73 2c |
|res=UC/postgres,| |
00000170 3d 55 2f 70 6f 73 74 67 72 65 73 7d 43 00 00 00 |
|=U/postgres}C...| |
00000180 0d 53 45 4c 45 43 54 20 36 00 5a 00 00 00 05 49 |
|.SELECT 6.Z....I| |
While AppScope makes no attempt to provide an execution trace for Postgres, you can use AppScope to track all payloads sent and received by a Postgres service. Normally, payload capture of this sort is not very useful, due to encryption. However, AppScope can access payloads before they are encrypted, and after they are decrypted. This means that you can readily observe every command received, and every response returned. There is no need to enable probes, modify source code, or incur a severe performance penalty to track all SQL activity.
The ability to observe all SQL activity creates a unique concern. There can be details that qualify as PII (Personally Identifiable Information) in the SQL statements and/or responses. Recognizable PII will usually need to be redacted. The details about performing such redaction are beyond the scope of this post. Check out Cribl LogStream for reference.
While the prospect of observing any and all SQL traffic may not be new or unique, this prospect offers the ability to answer a number of difficult questions. Questions such as: when did that admin have access to the DB; was a specific table modified; what tables have been accessed; which remote clients have had access, and to which databases? You get the idea.
AppScope is available today. Check out the website, and get started by downloading it. Try it yourself by playing in our AppScope Fundamentals sandbox, which will walk you through how to use AppScope in an interactive shell right from your browser. Star us on GitHub and share what you’re doing with AppScope in our #appscope channel on the Cribl Community Slack!
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.