Pythian Blog: Technical Track

Patroni – An awesome PostgreSQL HA Solution

Patroni

Patroni is a High Availability solution for PostgreSQL using asynchronous streaming replication, Etcd, and HAProxy. For more details, you can read the Patroni documentation which is really good and covers many details. In this post, I will explain:

  • Patroni architecture
  • Ansible for patroni cluster setup
  • verification of patroni/etcd cluster
  • Switchover
  • Fixing out-of-sync issue

Architecture

The Patroni agent is responsible for running PostgreSQL on each node and provides an API used to check a particular node’s current health. HAProxy also runs on each node, consumes this API and provides a single endpoint for the end-user to access the cluster’s leader, which is a node for managing all read/write operations for the whole cluster, often called a master node.

This endpoint can then be used for both reads and writes and is available on each node in the cluster on port `5000`. Patroni cluster may also provide an additional HAproxy endpoint that will accept only reads available on port `6000`. The Patroni REST API is available directly on each node on port `8008`.

The standard installation leaves the endpoints running on each node but without a global endpoint to be used by the applications. It would be better to add another layer with a load balancer to have a single point of entry to the available nodes in the cluster without having to care about which one to use on the client side

Using Ansible to create an HA PostgreSQL cluster with Patroni

Ansible is a suite of software tools that enables infrastructure as code. It is open-source and the suite includes software provisioning, configuration management, and application deployment functionality. For more details, please have a look at the ansible documentation 

Here is an example playbook containing two plays. The first ansible task will set up three compute instances on the Google Cloud Platform (GCP) that will be used to deploy the PostgreSQL databases. The second ansible task will configure the three nodes into a Patroni cluster.

# de-test-patroni-cluster.yml 

- name:         Launch VM(s)
  hosts:        localhost
  gather_facts: false
  roles:
    - role: gcp_vm
      gcp_vm_instances:
        - name:          de-gcp-patroni-example-01
        - name:          de-gcp-patroni-example-02
        - name:          de-gcp-patroni-example-03

      gcp_vm_options:
        defaults:
          state:         present
          os_image:      ubuntu1804
          type:          n1-standard-1
          ansible_group: patroni

    - role: gcp_lb
      gcp_lb_name: de-gcp-patroni-example
      gcp_lb_draining_timeout_secs: 60
      gcp_lb_healthcheck:
        tcp_port: 5000
      gcp_lb_instances: patroni   # This is the name of the Ansible host group containing the instances we want to use with this load balancer.
      gcp_lb_dns_name: de-gcp-patroni-example

- name:         Configure VM(s)
  hosts:        patroni
  become:       true

  roles:
    - role: patroni
      patroni_etcd_cluster_name:      my_etcd_cluster
      patroni_etcd_master_group_name: patroni                       
      patroni_master_node:            de-gcp-patroni-example-01     
      patroni_replication_nodes:      ""
      patroni_dbbackup:               true
      patroni_users:

        - name: myuser
          pass: mypass                    

      patroni_databases:                  
        - name:  mydatabase
          owner: myuser

        - name:  myseconddatabase
          owner: myuser

      patroni_database_schemas:           

        - database: mydatabae
          schema:   myschema
          owner:    myuser

      patroni_user_privileges:            

        - name:            myuser
          db:              mydatabase
          priv:            ALL
          role_attr_flags: NOCREATEDB

Ansible playbook can be run like this ( the above ansible code assumes you are using ansible version > 2.1, and installation notes can be found here )

ansible-playbook de-test-patroni-cluster.yml

After running this playbook, end users could access the PostgreSQL cluster via the GCP load balancer `de-gcp-patroni-example.de.gcp.cloud` on port `5000` for reading and writes or port `6000` for just reads.

Verify Patroni cluster status

Patroni needs at least two of the three nodes to be operational for the cluster to be available. One additional benefit is that Patroni completely manages your PostgreSQL databases. In terms of bringing it up and down, you can verify the status of a Patroni cluster and database in the following ways:

HTTP API

Make an HTTP request to one of the nodes on port `8008`. You may run it on any terminal with ssh connectivity to your Patroni cluster (below example used iterm terminal on mac). You should get a JSON document in the response showing the node’s role, details about the other replication nodes, and their status. As an example, using curl against the first node in the cluster created with the code above, will show the following output:

$ curl de-gcp-patroni-example-01:8008

{
  "database_system_identifier": "6820776744425517415",
  "postmaster_start_time": "2022-04-28 14:57:20.932 UTC",
  "timeline": 1,
  "cluster_unlocked": false,
  "patroni": {
    "scope": "patroniha",
    "version": "1.6.1"
  },

  "replication": [

    {

      "sync_state": "async",
      "sync_priority": 0,
      "client_addr": "10.80.129.111",
      "state": "streaming",
      "application_name": "de_gcp_patroni_example_03",
      "usename": "replicator"
    },

    {
      "sync_state": "async",
      "sync_priority": 0,
      "client_addr": "10.80.129.113",
      "state": "streaming",
      "application_name": "de_gcp_patroni_example_02",
      "usename": "replicator"
    }
  ],

  "state": "running",
  "role": "master",
  "xlog": {
    "location": 67125840
  },
  "server_version": 110007

}

Patronictl

Patroni comes with a CLI utility called patronictlOne can perform any admin operation related to the Postgres database or cluster using this command line utility. You can use the command line interface to describe the current cluster status by SSHing to one of the nodes and running the following command:

$ patronictl -d etcd://127.0.0.1:2379 list patroniha

+-----------+---------------------------+--------------------+--------+---------+----+-----------+
|  Cluster  |           Member          |        Host        |  Role  |  State  | TL | Lag in MB |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+
| patroniha | de_gcp_patroni_example_01 | 10.80.131.214:5433 | Leader | running |  1 |           |
| patroniha | de_gcp_patroni_example_02 | 10.80.129.113:5433 |        | running |  1 |       0.0 |
| patroniha | de_gcp_patroni_example_03 | 10.80.129.111:5433 |        | running |  1 |       0.0 |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+

Checking Logs

Patroni is managed via SystemD, and as such, you can get error logs for a particular server by SSHing to it, and then querying with `journalctl`. For example, on a master:

$ sudo journalctl -f -u patroni

Apr 28 16:06:21 de-gcp-patroni-example-01 patroni[20782]: 2022-04-28 16:06:21,983 INFO: Lock owner: de_gcp_patroni_example_01; I am de_gcp_patroni_example_01
Apr 28 16:06:22 de-gcp-patroni-example-01 patroni[20782]: 2022-04-28 16:06:22,017 INFO: no action.  i am the leader with the lock

On a secondary, you will see something slightly different:

$ sudo journalctl -f -u patroni

Apr 28 16:08:11 de-gcp-patroni-example-02 patroni[20552]: 2022-04-28 16:08:11,990 INFO: does not have lock
Apr 28 16:08:11 de-gcp-patroni-example-02 patroni[20552]: 2022-04-28 16:08:11,997 INFO: no action.  i am a secondary and i am following a leader

 

Verifying Etcd Cluster status

Etcd is used for Patroni to distribute configuration and state between nodes, so it must also be healthy for proper cluster operation.

You can check the status of the cluster by SSHing to one of the nodes and running the following:

$ etcdctl cluster-health patroniha

member 13a17454e5cb0797 is healthy: got healthy result from http://10.80.129.113:2379
member 4ba8a9618e42f288 is healthy: got healthy result from http://10.80.129.111:2379
member 6a453ce7684a6c8d is healthy: got healthy result from http://10.80.131.214:2379

cluster is healthy

 

Perform a Manual Switchover

Using the command line interface, you can force Patroni to switch to a new leader. The switchover will occur immediately and require small downtime ( cluster unavailability ).

1. First, you should SSH to one of the nodes in the cluster and verify the current cluster status.

$ patronictl -d etcd://127.0.0.1:2379 list patroniha

   +-----------+---------------------------+--------------------+--------+---------+----+-----------+
   |  Cluster  |           Member          |        Host        |  Role  |  State  | TL | Lag in MB |
   +-----------+---------------------------+--------------------+--------+---------+----+-----------+
   | patroniha | de_gcp_patroni_example_01 | 10.80.131.214:5433 | Leader | running |  1 |           |
   | patroniha | de_gcp_patroni_example_02 | 10.80.129.113:5433 |        | running |  1 |       0.0 |
   | patroniha | de_gcp_patroni_example_03 | 10.80.129.111:5433 |        | running |  1 |       0.0 |
   +-----------+---------------------------+--------------------+--------+---------+----+-----------+

2. Next, you can trigger the switchover and specify which node you want to make the new leader. You can also choose to schedule the switchover in the future or perform it immediately:

$ patronictl -d etcd://127.0.0.1:2379 switchover patroniha

Master [de_gcp_patroni_example_01]: <return>
Candidate ['de_gcp_patroni_example_02', 'de_gcp_patroni_example_03'] []: de_gcp_patroni_example_02
When should the switchover take place (e.g. 2020-04-28T17:15 )  [now]: <return>
Current cluster topology

+-----------+---------------------------+--------------------+--------+---------+----+-----------+
|  Cluster  |           Member          |        Host        |  Role  |  State  | TL | Lag in MB |
+-----------+--------------------------+--------------------+--------+---------+----+-----------+
| patroniha | de_gcp_patroni_example_01 | 10.80.131.214:5433 | Leader | running |  1 |           |
| patroniha | de_gcp_patroni_example_02 | 10.80.129.113:5433 |        | running |  1 |       0.0 |
| patroniha | de_gcp_patroni_example_03 | 10.80.129.111:5433 |        | running |  1 |       0.0 |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+

Are you sure you want to switchover cluster patroniha, demoting current master de_gcp_patroni_example_01? [y/N]:

You can verify the operation is correct and then enter `y` to proceed, after which you’ll see the following:

2020-04-28 16:17:09.50350 Successfully switched over to "de_gcp_patroni_example_02"

+-----------+---------------------------+--------------------+--------+---------+----+-----------+
|  Cluster  |           Member          |        Host        |  Role  |  State  | TL | Lag in MB |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+
| patroniha | de_gcp_patroni_example_01 | 10.80.131.214:5433 |        | stopped |    |   unknown |
| patroniha | de_gcp_patroni_example_02 | 10.80.129.113:5433 | Leader | running |  1 |           |
| patroniha | de_gcp_patroni_example_03 | 10.80.129.111:5433 |        | running |  1 |       0.0 |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+

3. Finally, you can verify that the switchover is now completed by checking the status again:

$ patronictl -d etcd://127.0.0.1:2379 list patroniha

+-----------+---------------------------+--------------------+--------+---------+----+-----------+
|  Cluster  |           Member          |        Host        |  Role  |  State  | TL | Lag in MB |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+
| patroniha | de_gcp_patroni_example_01 | 10.80.131.214:5433 |        | running |  2 |       0.0 |
| patroniha | de_gcp_patroni_example_02 | 10.80.129.113:5433 | Leader | running |  2 |           |
| patroniha | de_gcp_patroni_example_03 | 10.80.129.111:5433 |        | running |  2 |       0.0 |
+-----------+---------------------------+--------------------+--------+---------+----+-----------+

Fixing Patroni out of sync issue

In some situations, the PostgreSQL standby nodes can get out of sync. Some examples are when the primary has a high number of transactions, or many archives are being generated consuming large disk space, or some Wal files were removed by mistake, or by problems with the network.
 
You can follow the below steps to identify if secondary nodes are in sync and, if not, how to fix it using Patroni”

1. First, you should SSH to the primary node in the cluster and verify the current cluster status; you can execute the below command on any cluster node to find out which is the primary node.

$ patronictl -d etcd://127.0.0.1:2379 list patroniha

root@de-gcp-patroni-example-01:~# patronictl -d etcd://127.0.0.1:2379 list  patroniha
+ Cluster: patroniha (6830699472358332339) ------+--------+---------+----+-----------+
|          Member           |        Host        |  Role  |  State  | TL | Lag in MB |
+---------------------------+--------------------+--------+---------+----+-----------+
| de_gcp_patroni_example_01 | 10.80.128.61:5433  | Leader | running |  2 |           |
| de_gcp_patroni_example_02 | 10.80.128.160:5433 |        | running |  2 |        75 |
| de_gcp_patroni_example_03 | 10.80.128.236:5433 |        | running |  2 |         0 |
+---------------------------+--------------------+--------+---------+----+-----------+

    If you find `Lag in MB` is greater than zero for any secondary, it means there is a problem with its sync, and the command reinit can be used in the node to fix the issue.

2. Log in to the primary node of the cluster, and run the below command. Patroni has a built-in function reinit to make that standby in sync again, saving you from the manual work. This command will force the cluster node to sync data from the primary node.

patronictl -d etcd://127.0.0.1:2379 reinit cluster_name node_name

    For example:

root@de-gcp-patroni-example-01:~# patronictl -d etcd://127.0.0.1:2379 reinit patroniha de_gcp_patroni_example_02

   + Cluster: patroniha (6830699472358332339) ------+--------+---------+----+------------+
   |          Member           |        Host        |  Role  |  State  | TL | Lag in MB  |
   +---------------------------+--------------------+--------+---------+----+------------+
   | de_gcp_patroni_example_01 | 10.80.128.61:5433  | Leader | running |  2 |            |
   | de_gcp_patroni_example_02 | 10.80.128.160:5433 |        | running |  2 |         75 |
   | de_gcp_patroni_example_03 | 10.80.128.236:5433 |        | running |  2 |          0 |
   +---------------------------+--------------------+--------+---------+----+------------+

   Are you sure you want to reinitialize members de_gcp_patroni_example_02? [y/N]: y
   Success: reinitialize for member de_gcp_patroni_example_02

   root@de-gcp-patroni-example-01:~#

    You can log in to node de_gcp_patroni_example_02 and verify in Patroni service log too.

      1. ssh to de-gcp-patroni-example-02

      2. As root, run

root@de-gcp-patroni-example-02:~# journalctl -f -u patroni

      You will see some information in the log:

2022-05-26 08:54:41.748 UTC [2846] LOG:  database system is shut down
2022-05-26 08:54:41,760 INFO: Removing data directory: /data/postgresql/db
2022-05-26 0854:46,627 INFO: replica has been created using basebackup
2022-05-26 08:54:46,628 INFO: bootstrapped from leader 'de_gcp_patroni_example_01'
2022-05-26 08:54:46,633 INFO: closed patroni connection to the postgresql cluster
2022-05-26 08:54:47,442 INFO: postmaster pid=4426
localhost:5433 - no response

2022-05-26 08:54:47.713 UTC [4426] LOG:  starting PostgreSQL 12.3 (Ubuntu 12.3-1.pgdg18.04+1) on x86_64-pc

And, normal logging agai
    
2022-05-26 08:55:00,441 INFO: no action.  i am a secondary and i am following a leader
2022-05-26 08:55:10,433 INFO: Lock owner: de_gcp_patroni_example_01; I am de_gcp_patroni_example_02
2022-05-26 08:55:10,433 INFO: does not have lock
2022-05-26 08:55:10,440 INFO: no action.  i am a secondary and i am following a leader

3. After some time, depending on database size, “`Lag in MB “ should be zero again. You can verify as

$ patronictl -d etcd://127.0.0.1:2379 list patroniha

root@de-gcp-patroni-example-01:~# patronictl -d etcd://127.0.0.1:2379 list patroniha

+ Cluster: patroniha (6830699472358332339) ----+--------+---------+----+-----------+
|          Member           |        Host        |  Role  |  State  | TL | Lag in MB |
+---------------------------+--------------------+--------+---------+----+-----------+
| de_gcp_patroni_example_01 | 10.80.128.61:5433  | Leader | running |  2 |           |
| de_gcp_patroni_example_02 | 10.80.128.160:5433 |        | running |  2 |         0 |
| de_gcp_patroni_example_03 | 10.80.128.236:5433 |        | running |  2 |         0 |
+---------------------------+--------------------+--------+---------+----+-----------+

Conclusion

In this post, I have presented how to build a Patroni cluster using Ansible, validate the cluster status and see how Patroni can help to deal with two operations: switchover and reinit.

No Comments Yet

Let us know what you think

Subscribe by email