High available PostgreSQL on kubernetes

This post covers installing PostgreSQL in a highly available manner on kubernetes. While this was quite an adventure in the past, CloudNativePG - an awesome k8s operator for Postgres makes this rather easy and straight-forwards.

The TLDR is found in the Summary

CloudNativePG - the best PostgreSQL kubernetes operator so far
Install High available Postgres In Kubernetes
How many databases per cluster?
Whats next?
Summary

PostgreSQL is my favorite database system. Reasons for that are:

Huge and professional community
Stable - it's one of this install and forget technologies - it is almost impossible to break it
Rich ecosystem. PostgreSQL provides extensions for just anything. Geospatial data, Timeseries, Column stores, backups, monitoring, etc.
Foreign Data Wrappers. They allow to "connect" to other data stores and postgres can act as interface to other data
Rich SQL language support
Scalability. If done right, PostgreSQL can scale incredibly well also for very large databases

So if anybody asks, what database to use, PostgreSQL is almost always one of the right answers.

Nevertheless, one has to admit, that operating PostgreSQL is clearly targeted to bare-metal installations. With the rise of kubernetes, PostgreSQL was always a little bit a misfit. A lot of folks just ran kubernetes alongside a bare-metal PostgreSQL instance. Alternatively, some very smart people created incredible systems for high available, auto-failover PostgreSQL on kubernetes. One of the best of these tools was zalando`s patroni or spilo.

Up until recently I successfully used spilo to run high available PostgreSQL instances which survived several catastrophic infrastructure events. However - even I as hardcore PostgreSQL fan have to admin, that operating spilo was kind of a pain in the ass. There are a lot of moving parts and manual intervention (eg. for updates) was hard.

CloudNativePG - the best PostgreSQL kubernetes operator so far

That's where CloudNativePG. CloudNativePG was developed and sponsored by the awesome people at EDB. CloudNativePG attempts to tackle this operational struggles of current postgres - k8s solutions. It provides a kubernetes operator that can bootstrap a high available PostgreSQL database cluster, using only kubernetes APIs and native streaming replication.

The clusters can run in private and public clouds and even hybrid-cloud environments.

As from their github repository

1The goal of CloudNativePG is to increase the adoption of PostgreSQL, one of the most loved DBMS in traditional VM and bare metal environments, inside Kubernetes, thus making the database an integral part of the development process and CI/CD automated pipelines.

Again, citing what they write on their github repository:

The operator is responsible for managing the status of the Cluster resource, keeping it up to date with the information that each PostgreSQL instance manager regularly reports back through the API server. Such changes might trigger, for example, actions like:

a PostgreSQL failover where, after an unexpected failure of a cluster's primary instance, the operator itself elects the new primary, updates the status, and directly coordinates the operation through the reconciliation loop, by relying on the instance managers
scaling up or down the number of read-only replicas, based on a positive or negative variation in the number of desired instances in the cluster, so that the operator creates or removes the required resources to run PostgreSQL, such as persistent volumes, persistent volume claims, pods, secrets, config maps, and then coordinates cloning and streaming replication tasks
updates of the endpoints of the PostgreSQL services that applications rely on to interact with the database, as Kubernetes represents the single source of truth and authority
updates of container images in a rolling fashion, following a change in the image name, by first updating the pods where replicas are running, and then the primary, issuing a switchover first

Furthermore, CloudNativePG provides resources for automatic backups and connection pooling.

In summary, the CloudNativePG operator provides the following levels of operations

k3s high available multi-server deployment with embedded storage

Install High available Postgres In Kubernetes

Now that we know what CloudNativePG is, let's start with a fresh install.

Prerequisites

For this guide to work, you'll need a running kubernetes cluster with at least 3 worker nodes and kubectl installed.

Finally, make sure that krewthe kubectl extension manager is installed, as described in the crew docs

1. Install the operator

For convenience, the folks at EDB provided us with a kubectl plugin called cnpg. Install it with:
```
1kubectl krew install cnpg
```
Generate the operator yaml manifest. The -n flag defines the namespace where the operator is deployed to and the replicas flag tells us how many replicas of the operator should be installed (note: number of operator replicas - not postgres instances)
```
1kubectl cnpg install generate -n devops-system --replicas 3 > operator-manifests.yaml
```
This will create a file called operator-manifests.yaml.

(Optional) The operator can be deployed using these manifests. However, the operator comes without any kubernetes tolerations. If you have separate control-plane and worker kubernetes nodes, you might want to add some tolerations to allow scheduling of the operator to the control-plane nodes. This is - however - highly optional. If you want to add some tolerations, open the file and add the tolerations to the operator deployment. Look for the following deployment (faaaar down in the file):

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4creationTimestamp: null
5labels:
6    app.kubernetes.io/name: cloudnative-pg
7name: cnpg-controller-manager
8namespace: devops-system

At the very end of the specification of the deployment, add the tolerations (see highlighted lines below):

1  - name: cnpg-pull-secret
2  securityContext:
3    runAsNonRoot: true
4    seccompProfile:
5      type: RuntimeDefault
6  serviceAccountName: cnpg-manager
7  terminationGracePeriodSeconds: 10
8  tolerations:
9  - effect: NoSchedule
10    key: node-role.kubernetes.io/master
11    operator: Exists
12  - effect: NoSchedule
13    key: node-role.kubernetes.io/control-plane
14    operator: Exists
15  volumes:
16  - emptyDir: {}
17    name: scratch-data
18  - name: webhook-certificates

This example tolerations will allow the operator to be scheduled on the control-plane nodes - if they have the well-known taint node-role.kubernetes.io/control-plane:NoSchedule or node-role.kubernetes.io/master:NoSchedule.

Deploy the manifests:

1kubectl apply -f operator-manifests.yaml

Congratulations: You just deployed the CloudNativePG kubernetes operator.

To check if everything is alright, run kubectl get pods -n devops-system. The output should look similar to

1NAME                                           READY   STATUS    RESTARTS   AGE
2pod/cnpg-controller-manager-6448848cc9-25dwj   1/1     Running   0          3m16s
3pod/cnpg-controller-manager-6448848cc9-9l4wt   1/1     Running   0          3m16s
4pod/cnpg-controller-manager-6448848cc9-pzcjd   1/1     Running   0          3m16s

2. Create a PostgreSQL cluster

Now that our operator is installed, let's create a high available PostgreSQL cluster. We will also bootstrap an application database, add some database users and add some configuration options. All of that can be done in a single yaml-manifest - making setting up a cluster a delight.

The following section shows the manifest - I'll add comments where appropriate for more details.

For defining the secrets, keep in mind that kubernetes expects the secrets to be base64 encoded. On Linux, simply pass your secrets through base64 as follows:

1echo myrealpassword | base64

For the section about postgres settings, I used the toolpgtune to find settings appropriate for my nodes. Please navigate there and use the settings which best fit for your nodes.

1apiVersion: v1
2type: kubernetes.io/basic-auth
3kind: Secret
4data:
5  password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
6  username: YXBwX3VzZXI= #app_user
7metadata:
8  name: example-app-user
9  namespace: postgres-namespace
10---
11apiVersion: v1
12kind: Secret
13type: kubernetes.io/basic-auth
14data:
15  password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
16  username: cG9zdGdyZXM= #postgres - Note: It NEEDs to be postgres
17metadata:
18  name: example-superuser
19  namespace: postgres-namespace
20---
21apiVersion: postgresql.cnpg.io/v1
22kind: Cluster
23metadata:
24  name: example-cluster
25  namespace: postgres-namespace
26spec:
27  description: "DevOps and more cluster"
28  imageName: ghcr.io/cloudnative-pg/postgresql:15.1
29
30  # Number of instances. The operator automatically starts one master
31  # and two replicas.
32  instances: 3
33
34  # How many seconds to wait before the liveness probe starts
35  # Should be higher than the time needed for postgres to start
36  startDelay: 30
37  # When shutting down, for how long (in seconds) the operator should
38  # wait before executing a fast shutdown (terminating existing connections)
39  # The operator actually waits for half the time before executing fast shutdown.
40  # The other half is used to wait for finishing WAL archiving.
41  stopDelay: 100
42
43  # Example of rolling update strategy:
44  # - unsupervised: automated update of the primary once all
45  #                 replicas have been upgraded (default)
46  # - supervised: requires manual supervision to perform
47  #               the switchover of the primary
48  primaryUpdateStrategy: unsupervised
49
50  # These are the settings of postgres. Use pgtune for determining them
51  postgresql:
52    parameters:
53      max_connections: 100
54      shared_buffers: 2560MB
55      effective_cache_size: 7680MB
56      maintenance_work_mem: 640MB
57      checkpoint_completion_target: 0.9
58      wal_buffers: 16MB
59      default_statistics_target: 100
60      random_page_cost: 1.1
61      effective_io_concurrency: 200
62      work_mem: 13107kB
63      min_wal_size: 1GB
64      max_wal_size: 4GB
65      max_worker_processes: 4
66      max_parallel_workers_per_gather: 2
67      max_parallel_workers: 4
68      max_parallel_maintenance_workers: 2
69      
70      # These settings below automatically enable
71      # the pg_stat_statements and auto_explain extensions.
72      # No need, to add them to the shared_preload_libraries
73      pg_stat_statements.max: '10000'
74      pg_stat_statements.track: all
75      auto_explain.log_min_duration: '5s'
76    
77    # pg_hba.conf file configuration. This line ensures, that, for 
78    # all databases, all users can connect on the specified IP-Adress
79    # The scram-sha-256 method should be used for password verification
80    # https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
81    pg_hba:
82      - host all all 10.244.0.0/16 scram-sha-256
83
84  bootstrap:
85    initdb:
86      database: app
87      # name of the user who owns the database.
88      # There needs to be a secret for this user.
89      owner: app_user
90      secret:
91        name: example-app-user
92    # Alternative bootstrap method: start from a backup
93    #recovery:
94    #  backup:
95    #    name: backup-example
96
97  superuserSecret:
98    name: example-superuser
99
100  # Template for the PVC. Keep in mind that for storage
101  # which is bound directly to a node, the pod can consume as much 
102  # storage until the disk space is full. The setting here is not a limit.
103  storage:
104    pvcTemplate:
105      accessModes:
106        - ReadWriteOnce
107      resources:
108        requests:
109          storage: 10Gi
110      volumeMode: Filesystem
111
112  resources:
113    requests:
114      memory: "1Gi"
115      cpu: "1"
116    limits:
117      memory: "10Gi"
118  
119#   # Optional setting: Only deploy on worker nodes
120#   nodeSelector:
121#     kubernetes.io/role: "worker"
122
123  # Optional setting: For nodes with locally attached storage, 
124  # When using local storage for PostgreSQL, you are advised to temporarily put the cluster in maintenance mode through the nodeMaintenanceWindow option to avoid standard self-healing procedures to kick in, while, for example, enlarging the partition on the physical node or updating the node itself.
125  nodeMaintenanceWindow:
126    inProgress: false
127    # States if an existing PVC is reused or not during the maintenance operation.
128    # When enabled, Kubernetes waits for the node to come up again and then reuses the existing PVC. Suggested to keep on.
129    reusePVC: true

Apply the manifest by running kubectl apply -f <your-manifest-filename>.yaml

Check if everything works fine by running kubectl -n postgres-namespace get pods. The output should be similar to

1NAME                        READY   STATUS    RESTARTS   AGE
2example-cluster-1   1/1     Running   0          25m
3example-cluster-2   1/1     Running   0          24m
4example-cluster-3   1/1     Running   0          19m

AWESOME: We now have a running postgresql cluster with 3 needs, high availability and auto-failover!

You might check on the cluster health, by running:

1kubectl cnpg status -n postgres-namespace example-cluster -v

This will output some nicely formatted information about the general PostgreSQL cluster state.

Additionally, the cnpg kubectl command provides some handy commands:

1  certificate Create a client certificate to connect to PostgreSQL using TLS and Certificate authentication
2  completion  Generate the autocompletion script for the specified shell
3  destroy     Destroy the instance named [CLUSTER_NAME] and [INSTANCE_ID] with the associated PVC
4  fencing     Fencing related commands
5  hibernate   Hibernation related commands
6  install     CNPG installation commands
7  maintenance Sets or removes maintenance mode from clusters
8  pgbench     Creates a pgbench job
9  promote     Promote the pod named [cluster]-[node] or [node] to primary
10  reload      Reload the cluster
11  report      Report on the operator
12  restart     Restart a cluster or a single instance in a cluster
13  status      Get the status of a PostgreSQL cluster

How many databases per cluster?

In traditional PostgreSQL hosting, multiple databases are part of one database instance or cluster. However, my (and EDBs) suggestion for Kubernetes workloads in general is, to have a separate cluster per database - entirely utilized by a single microservice application.

As per definition of microservices, they should own and manage their data - in their specific database. O the microservice can access the database, including schema management and migrations.

(The following paragraph is directly from CloudNativePG) CloudNativePG has been designed to work this way out of the box, by default creating an application user and an application database owned by the aforementioned application user.

Reserving a PostgreSQL instance to a single microservice owned database, enhances:

resource management: in PostgreSQL, CPU, and memory constrained resources are generally handled at the instance level, not the database level, making it easier to integrate it with Kubernetes resource management policies at the pod level
physical continuous backup and Point-In-Time-Recovery (PITR): given that PostgreSQL handles continuous backup and recovery at the instance level, having one database per instance simplifies PITR operations, differentiates retention policy management, and increases data protection of backups
application updates: enable each application to decide their update policies without impacting other databases owned by different applications
database updates: each application can decide which PostgreSQL version to use, and independently, when to upgrade to a different major version of PostgreSQL and at what conditions (e.g., cutover time)

While I'd encourage you to keep to this pattern, it's very much possible to use the postgres superuser and create multiple schemas and databases per cluster.

Whats next?

Now that you have a highly available PostgreSQL cluster running, the heavy lifting is already done. Nevertheless, there are some imporant steps to continue.

Add Backups and WAL archiving: Docs
Learn how to use the cluster control plugin cnpg: Docs

Summary

This guide introduced CloudNativePG - a state-of-the-art kubernetes operator to manage a highly available PostgreSQL cluster, with auto-failover, backups etc. - all one needs for modern PostgreSQL operations.

Setting up a PostgreSQL cluster is done as follows:

Install the cnpg plugin: kubectl krew install cnpg
Generate the operator manifests: kubectl cnpg install generate -n devops-system --replicas 3 > operator-manifests.yaml
Optional: If you want, adjust the operator manifests for node tolerations or node affinity
Apply the operator: kubectl apply -f operator-manifests.yaml

Create your cluster manifest file:

1apiVersion: v1
2type: kubernetes.io/basic-auth
3kind: Secret
4data:
5  password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
6  username: YXBwX3VzZXI= #app_user
7metadata:
8  name: example-app-user
9  namespace: postgres-namespace
10---
11apiVersion: v1
12kind: Secret
13type: kubernetes.io/basic-auth
14data:
15  password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
16  username: cG9zdGdyZXM= #postgres - Note: It NEEDs to be postgres
17metadata:
18  name: example-superuser
19  namespace: postgres-namespace
20---
21apiVersion: postgresql.cnpg.io/v1
22kind: Cluster
23metadata:
24  name: example-cluster
25  namespace: postgres-namespace
26spec:
27  description: "DevOps and more cluster"
28  imageName: ghcr.io/cloudnative-pg/postgresql:15.1
29  instances: 3
30  startDelay: 30
31  stopDelay: 100
32  primaryUpdateStrategy: unsupervised
33
34  postgresql:
35    parameters:
36      max_connections: 100
37      shared_buffers: 2560MB
38      effective_cache_size: 7680MB
39      maintenance_work_mem: 640MB
40      checkpoint_completion_target: 0.9
41      wal_buffers: 16MB
42      default_statistics_target: 100
43      random_page_cost: 1.1
44      effective_io_concurrency: 200
45      work_mem: 13107kB
46      min_wal_size: 1GB
47      max_wal_size: 4GB
48      max_worker_processes: 4
49      max_parallel_workers_per_gather: 2
50      max_parallel_workers: 4
51      max_parallel_maintenance_workers: 2
52      pg_stat_statements.max: '10000'
53      pg_stat_statements.track: all
54      auto_explain.log_min_duration: '5s'
55    pg_hba:
56      - host all all 10.244.0.0/16 scram-sha-256
57
58  bootstrap:
59    initdb:
60      database: app
61      owner: app_user
62      secret:
63        name: example-app-user
64    # Alternative bootstrap method: start from a backup
65    #recovery:
66    #  backup:
67    #    name: backup-example
68
69  superuserSecret:
70    name: example-superuser
71  storage:
72    pvcTemplate:
73      accessModes:
74        - ReadWriteOnce
75      resources:
76        requests:
77          storage: 10Gi
78      volumeMode: Filesystem
79
80  resources:
81    requests:
82      memory: "1Gi"
83      cpu: "1"
84    limits:
85      memory: "10Gi"
86  
87  #   # Optional setting: Only deploy on worker nodes
88  #   nodeSelector:
89  #     kubernetes.io/role: "worker"
90
91  nodeMaintenanceWindow:
92    inProgress: false
93    reusePVC: true

Apply the manifest with kubectl apply -f <manifest-filename>.yaml
Check your cluster status with kubectl cnpg status -n postgres-namespace example-cluster -v

------------------

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

Cost control
Data privacy
Excellent performance - adjusted specifically for your intended use

Get your free LLM training guide