Pythian Blog: Technical Track

Monitoring Transaction Logs in PostgreSQL

An important part of PostgreSQL monitoring is looking at how many transaction logs or WAL (Write Ahead Logs) files that are produced. WALs are produced as a part of the MVCC (MultiVersion Concurrency Control) that PostgreSQL uses. WAL ensures that changes made to database objects (e.g., tables and indexes) will be preserved in the event  the database crashes. Therefore, any changes to database objects are recorded in the WALs before making the changes to the database files.

 

 

If the archiving daemon has been turned off, these WALs are generated but will overwrite the oldest transaction log in turn. So, there’s no need to monitor the space.  However, with archiving off, if there’s a media crash, the database can’t be recovered (as there are no backups), so a production system shouldn’t operate with have archiving off. To maintain continuous archiving, the archive daemon should be running and have the appropriate parameters or streaming replication configured.

Note: To turn on the archiving daemon, the “archive_mode” parameter needs to be set to “on” and the database cluster restarted.

These WAL files are vital to database operations, so we need to know how much space is being used.  

Note: The terms “transaction log” and WAL can be used interchangeably.

Usually, no one worries about WAL file generation, as the WAL directory (pg_xlog on PG versions below 10 and pg_wal on PG version 11 and above) is a sub-directory under the PGDATA home directory. So, if the PGDATA directory isn’t full, why should we worry about the WAL directory? 

There are several reasons why WAL space is important:

  • On cloud-based Database as a Service (DaaS), using more disk space for WALs than necessary can be expensive. Knowing the requirement is important for estimating the database footprint.
  • WALs play a large part in backup times, so knowing how many WALs are expected will help determine how much space for backups is needed and how long restoring might take.
  • If you have replication (logical or physical), you should know how many days worth of WAL the system can hold before the subscriber or standby becomes unrecoverable via normal WAL file transfer. This can be an issue with logical replication because a long transaction can cause lag, which, in turn, causes issues with WAL file retention on the primary, as the files won’t be removed until the transaction is completed.

How do we monitor transaction log usage in PostgreSQL?

Usually, the WAL directory for PG is located (by default) under the PGDATA home directory.  Therefore, unless there is a separate mountpoint for the “pg_wal” directory (by creating a link from the PGDATA home directory to the new mount point), the WALs are just another set of files under the larger PGDATA home.

There are several reasons why having the WAL files located on a separate mount point is a good idea.

  • Producing WAL files is usually an IO bottleneck. It is vital for high write systems like OnLine Transaction Processing (OLTP) systems to be able to write very quickly to the filesystem. So, we can put the WAL files mountpoint on expensive Solid State Disks (SSD), while normal disks are used for the PGDATA home. That saves us money and increases performance.
  • We can monitor the WAL disk space usage directly, rather than monitor the size of the PGDATA directory and then subtract the pg_wal directory size, to find out how much of the overall space is occupied by WAL files. This makes it easier to see when we have an issue with WAL file retention.
  • To monitor WAL disk space usage, we can use OS programs like du or df. From inside the database we can use some SQL commands that use a function called “pg_ls_dir”.  Here are a few examples that could be used in basic monitoring tools:

Linux Operating System checks:

As part of a combined filesystem:

$ du $PGDATA/pg_wal -m

1      /var/lib/postgresql/14/main/pg_wal/archive_status

881    /var/lib/postgresql/14/main/pg_wal

On a separate mountpoint:

$ df -hP /pg_wal

Filesystem                 Size Used Avail Use% Mounted on

/dev/mapper/vgbackup_wal   2G 881M  1167M  44% /pg_wal

Using SQL:

$: psql -d db01
SELECT ( a.wals::int * b.wal_seg_size::int / 1024 / 1024 ) AS "WAL_Space_Used_MB"
FROM   (
              SELECT count(*) wals
              FROM   pg_ls_dir('pg_wal')
              WHERE  pg_ls_dir ~ '^[0-9A-F]{24}'
       ) a,
       (
              SELECT  setting wal_seg_size
              FROM    pg_settings
              WHERE   name = 'wal_segment_size'
        ) b;
 WAL_Space_Used_MB:   896

Note: In Cloud-based databases, the “pg_ls_dir” function is not available due to no access to the superuser privilege on DaaS.

pg_stat_wal

From PostgreSQL 14 onwards, there’s another option for monitoring WAL file stats, which is the “pg_stat_wal” view.  This view gives a single row of data, where the stats collector records WAL-related data from the time an instance is started or recovered.

So, if we want to trace WAL file related statistics, over time, we just have to add a timestamp component. The SQL results can be written out to a file or stored in a wal_stats table for analysis later on.

SQL output of the “pg_wal_stat” view:

/*      wal_records      -- Number of records inserted into the transaction log
        wal_fpi          -- Number of Full Page Images written to the transaction log
        wal_bytes        -- The amount of traffic generated towards the WAL logs.
        wal_buffers_full -- Number of times WAL data was written to disk because WAL buffers became full
        wal_write        -- Number of times WAL buffers were written out to disk via XLogWrite request.
        wal_write_time   -- Total amount of time (ms) spent writing WAL buffers to disk via XLogWrite 
        wal_sync_time    -- Total amount of time (sm) spent syncing WAL files to disk via issue_xlog_fsync 
        stats_reset      -- Time at which these statistics were last reset
*/
SELECT now() wal_capture_time,
        wal_records,
        wal_fpi,
        wal_bytes,
        wal_buffers_full,
        wal_write,
        wal_write_time,
        wal_sync_time,
        stats_reset
FROM   pg_stat_wal;

-[ RECORD 1 ]----+------------------------------
wal_capture_time | 2022-05-18 18:21:11.999589+00
wal_records      | 0
wal_fpi          | 0
wal_bytes        | 0
wal_buffers_full | 0
wal_write        | 1
wal_write_time   | 0
wal_sync_time    | 0
stats_reset      | 2022-05-18 18:06:20.203876+00

The most important data items are the “wal_records”, “wal_bytes”, with the timestamp (using SQL or a date from another program). Now, the total amount of WAL bytes can be tracked over time.

The real advantage of this method is that it can be used in conjunction with Cloud-based databases. This was tested with Amazon’s Relation Database System (RDS) and completed successfully.

Here’s an example of this working with a PostgreSQL14 instance in Amazon’s RDS:

$ psql -h database-1.***But is .amazonaws.com -p 5432 --dbname=db01 --user=postgres
db01=> SELECT wal_records, wal_fpi, wal_bytes, wal_buffers_full FROM pg_stat_wal ;

 wal_records | wal_fpi | wal_bytes | wal_buffers_full
         887 |     165 |    401865 |            43323

db01=> CREATE TABLE t1 AS SELECT  md5(random()::text) id FROM generate_series(1, 100000);
db01=>  SELECT wal_records, wal_fpi, wal_bytes, wal_buffers_full FROM pg_stat_wal ;

 wal_records | wal_fpi | wal_bytes | wal_buffers_full
      406040 |     218 |  34888334 |            51148

What parameters affect the transaction logs?

There are certain PostgreSQL parameters that can affect WAL file generation and performance. Here are the main ones:

Parameter Value and Units Description
archive_mode on|off Sets the archiving of WAL files by using the archive command. Also, enables the archiver daemon.
archive_command Examples are:. cp, pgbackrest, scp
If the command is left blank (and “archive_mode” is on), the WAL segments will accumulate.
Setting “/bin/true” will work but won’t accumulate WAL files.
Used in conjunction with the “archive_mode” parameter. An OS command that executes when a completed WAL file segment is archived.
checkpoint_timeout measured in seconds Sets the maximum time between automatic WAL checkpoints
checkpoint_completion_target Percentage of the total time between checkpoints
Default is: 0.9
Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
min_wal_size Measured in bytes but can be KB, MB or GB. Sets the minimum size that the WAL directory can shrink to.
max_wal_size Measured in bytes but can be KB, MB or GB. Sets the WAL size that triggers a checkpoint.
wal_keep_segments Measured in MB Sets the number of WAL files held for standby servers.
Deprecated after PG v12.
wal_keep_size Measured in bytes Specifies the minimum size of past transaction logs to be kept in the pg_wal directory, for replication purposes. New parameter – PG v13 +
wal_level minimum, replica, logical Sets the level of information written to the WAL.

Note: The term “WAL segment” and “WAL file” can be used interchangeably.

There are some additional things to note for parameter settings and WAL file performance.

Checkpoints can have an impact on WAL file production. To save disk space, the WAL files need to be recycled occasionally. This is where the checkpoint process comes into play.

When data is written to the database, the idea is that the data is written to the WAL file first, and then the data is written onto a dirty buffer, located inside the shared buffer pool. The dirty buffers have to be written to disk at some point, either by the background writer or during a checkpoint.  When all the dirty buffers have been written to disk (up to that point in time), the WAL file can be deleted/recycled.  So, checkpoints can have a direct impact on how many WAL files are kept on disk. 

For example, the greater the time difference between checkpoints, the less WAL will be generated. That’s because when a data block is touched for the first time, after a checkpoint has completed, the entire block has to be sent to the WAL file. If that block is changed multiple times, before the next checkpoint, then only the changes are written to the WAL file rather than the entire block. The longer the time between checkpoints, the less number of blocks are written to the transaction log. So, spreading out the checkpoints decreases the amount of WAL written.  

The parameters can affect how much space is used and usually will vary between the values in “min_wal_size” and” max_wal_size”. The “min_wal_size” is the parameter that determines what the minimum number of WAL files are to be kept in the WAL directory, at any one time. The “max_wal_size” parameter describes the maximum amount of WAL files that can be produced before an automatic checkpoint is run. Increasing the limit can reduce the number of checkpoints but does increase the time needed for crash recovery.

The “wal_keep_size” value is used for determining the amount of WAL files that need to be kept, in case a standby database needs to retrieve them for streaming replication. If the “wal_keep_size” is greater than “max_wal_size”, then the  “wal_keep_size” supersedes the “max_wal_size” value.

Finally, the “wal_level” value does change the amount of data written into the WAL file. More data is written into the log at each value, so “minimal” writes the least data, “replica” (the level needed for WAL archiving), is the default, and “logical” is used for logical replication and writes the most data. If backups or standby databases are needed, the “replica” should be used.  Only if logical replication is being used, then “logical” is needed.

Other factors that can affect WAL file generation and performance

Some database actions can have large impacts on database WAL generation. Here are a few cases:

  • Large number of transactions: If a system is doing lots of Data Manipulation Language (DML) calls, then this will generate lots of changes to database pages and these changes have to be written to WAL files before being written to disk.
  • Vacuuming settings: Large amounts of vacuuming can also generate lots of WAL segments.  Having the incorrect vacuum or auto vacuuming parameters can cause WAL segments to increase unnecessarily.
  • Archiving failure: If the “archive_command” parameter is set and fails, then WAL files won’t be removed from the WAL directory, as it is waiting until the WAL file can be archived successfully.
  • Replication: The WAL files won’t be deleted from the WAL directory on the primary database until they have been applied to the subscriber (logical) or standby (physical) instances. Therefore, if there’s any replication lag then the number of WAL files being kept on the primary server can build up. Careful attention should be given to replication lag.
  • WAL segment size: The default buffer size is 16MB, which for heavy write systems may not be enough. Since PG 11, there has been a mechanism where the log buffers can be resized using the “pg_resetwal” command.  Below is the command to change the wal_segment size from 16MB to 64MB:
pg_resetwal -D /var/lib/postgresql/14/main --wal-segsize=64

Having a larger buffer size means less overhead for smaller files. It can also increase write performance.

How can we reduce the space used by transaction logs?

Here are a few ways to reduce the amount of WAL files generated:

1. Turn on “wal_compression”. This parameter compresses the WAL file, if the “full_page_writes” parameter is set to “on” (the default).  This will reduce space usage but comes at the expense of using extra CPU for write and recovery operations.

2. Reduce the “wal_keep_size” parameter. Only keep what we need in the “pg_wal” directory.  Ensuring that there is enough space for a day’s worth of WAL files plus 20% is a good practice. You likely won’t need more. However, this means that any logical replication transactions should be kept to minutes rather than hours, and monitoring has to be in place in case the lag increases. There should also be a plan in place if there is a network glitch or a long running transaction. For example, moving the oldest WALs to another location and creating a soft link to them or extending the “pg_wal” space.

In DaaS, the space probably will expand automatically but this causes another issue as there is no way to reclaim that space unless a new database cluster is created and the data is ported over.

3. Reduce WAL production during large data loads. During periods of loading large amounts of data, a few points may (if possible) help reduce WAL file generation.  They include:

  • Using COPY in one command, instead of using multiple INSERTs. This is quicker than multiple inserts and has less overhead.
  • Remove indexes: On tables with a lot of indexes, removing those indexes and rebuilding them afterwards will save a lot of time and WAL space.
  • Increase “max_wal_size”. If this parameter is increased, temporarily, then the number of checkpoints needed to resolve the data load will be decreased.  

Note: This is meant to be a temporary fix, while loading large amounts of data.

Conclusion

Watching the “pg_wal” directory is important for managing disk space usage and for monitoring the overall well being of a PostgreSQL instance. Unexpected spikes in WAL file generation can cause problems during replication and if archiving is enabled, and the “pg_wal” directory fills up, it could stop the cluster from running. Setting the parameters up correctly is vital for WAL file performance. Finding the correct parameters settings is the key, but it’s not easy to achieve.

For the best performance, having WAL files on separate mount points with faster storage, should be considered a best practice. Lastly, increasing the size of the WAL file buffer (the default is 16MB) is recommended as larger buffers (files) tend to have less overhead and can be archived faster.

References:

https://www.postgresql.org/docs/current/runtime-config-wal.html

https://www.postgresql.org/docs/current/wal-configuration.html

https://www.postgresql.org/docs/14/populate.html

 

I hope you found this post helpful. Feel free to drop any questions or share your thoughts in the comments, and make sure to sign up for updates.

 

 

No Comments Yet

Let us know what you think

Subscribe by email