Pythian Blog: Technical Track

PostgreSQL Backups

There are at least four ways to back up a Postgres database: SQL dump, filesystem snapshots, continuous archiving, and third-party tools. For each, notes on the recovery point objective (or RPO, which is a measure of how up-to-date your application and business needs will require the data to be) and recovery time objective (or RTO, which is a measure of how quickly the restore needs to be completed after an outage begins) are provided. You should weigh these indicators against your business requirements for the data in question. Note that you might best achieve RTO by using a high-availability solution in addition to backups. Assuming the outage is not replicated or otherwise affecting a replica, then a failover will allow time for an extended restore.

SQL Dump

The Postgres tool pg_dump executes a SQL dump. The main benefit is that this is available for any version of Postgres (even older versions) and any user that has read access to the database. This dump can also be done remotely. Also, the logical dump can be restored to different versions of Postgres. However, for most production-size data, it'll take more time to dump and restore than is reasonable for application needs. It can also require a large amount of space to dump a production-size database (although the dump can be piped through gzip). The dump can also use a large amount of network bandwidth if done remotely. Also, pg_dump will be point-in-time, which can be a benefit or a challenge. Restores are done using the native tool pg_restore. This method has a potentially large RPO and RTO, as the dump can take a while to execute and restore.

Filesystem Snapshots

A filesystem snapshot is faster than a dump. However, there are some things to watch out for. A typical production PostgreSQL database is constantly changing. Also, not all data has been written to the filesystem at any point in time. For example, there are logs used for crash recovery. Any snapshot has to take this into consideration. Some filesystems account for consistent snapshots (such as ZFS), but usually, the database should be stopped to get a valid point-in-time backup. In any case, the WAL files should be included in any snapshot, to ensure that all changes are applied on restore. Also note that for cloud-based PostgreSQL installations, filesystem snapshots are easily achieved through console features. The standard advice above regarding recovery files applies. Assuming that either snapshot are immediate (as is true with many cloud providers) or that the initial rsync is completed, RPO can be fairly recent. The RTO can be extended, depending on how well the restore process is defined and practiced and based on network time.

Filesystem Copy

Also available at the filesystem level are methods such as rsync. The rsync technology, available on Linux systems, streams and synchronizes files based on their timestamps. This can be achieved while the database stays online. When used to back up a database in which many files remain the same but a handful of files are constantly updated, you can perform an initial rsync to capture the bulk of data (this will take more time), followed by a second, quicker rsync to capture any changed files.

Continuous Archiving

Continuous archiving is not necessarily a different method, but a combination of some of the above ideas into a cohesive process. A backup and the WAL files are used to recover a database. Because replay of the WAL files can be stopped at any timestamp, one benefit of this process is that databases can be handily recovered to any point in time available. Any number of WAL files can be replayed, assuming that they reach back as far as the start of the last backup. The key to using WAL files for recovery at any point in time is to stream them off the database server. PostgreSQL rotates through the files, overwriting them as space usage is configured. Streaming them off the server allows you to keep a copy of the files safe from being overwritten. This streaming is enabled in the PostgreSQL configuration to turn on WAL archiving and to specify a command used to copy the files to another location. Note that PostgreSQL retains the WAL files until they are successfully copied away. Therefore, make sure to enable monitoring to prevent an outage from a full disk, network failure, or other issues. In this process, the PostgreSQL native tool pg_basebackup is used to get the initial backup file. Along with this command, a tracking file is used to record timestamps of the backup start/finish, and the WAL logs. Of the native methods, continuous archiving (after initial setup), offers the best RPO, because the WAL archiving allows point-in-time recovery. You can extend the RTO, depending on how well you define and practice the restore process (based on network time).

Third-Party Tools

There are, of course, several third-party tools that encapsulate the above methods. These facilitate backups across wide topologies and often have user-friendly features for monitoring and automation. For example, EnterpriseDB offers a backup and recovery tool. One such open-source tool is Barman. Barman can leverage either the rsync method mentioned above or continuous archiving. The continuous archiving method is preferred for most modern versions of PostgreSQL. In later versions of Barman, the WAL files stay more up-to-date using Barman-specific commands, offering an advantage for RPO. Barman also has the ability to replicate itself across geographical regions. This offers higher availability in the event that a single data center or region has an outage. The benefit of using a tool such as Barman is that it's well-tested in many environments. It also has a built-in set of helper commands to control the backups both globally and per-server. For example, you can use the barman check-backup <server_name> <backup_id> command to check the consistency of the backup and WAL files for a full restore. Barman commands let you list backups, delete backups, set up Barman replication, run a recovery, and more. Features include the ability to perform parallel processes, compress files to reduce network bandwidth, and backup/restore remotely.

Recommendations

In general, either continuous archiving or Barman support backups with generally sufficient RPO and RTO. Depending on evolving business requirements and cloud strategies, we can fine-tune this recommendation. Implementing a backup solution that also has the ability to support disaster recovery by storing backups across data centers or cloud regions offers further benefits.

No Comments Yet

Let us know what you think

Subscribe by email