High available PostgreSQL on kubernetes

blog preview

This post covers installing PostgreSQL in a highly available manner on kubernetes. While this was quite an adventure in the past, CloudNativePG - an awesome k8s operator for Postgres makes this rather easy and straight-forwards.

The TLDR is found in the Summary

PostgreSQL is my favorite database system. Reasons for that are:

  • Huge and professional community
  • Stable - it's one of this install and forget technologies - it is almost impossible to break it
  • Rich ecosystem. PostgreSQL provides extensions for just anything. Geospatial data, Timeseries, Column stores, backups, monitoring, etc.
  • Foreign Data Wrappers. They allow to "connect" to other data stores and postgres can act as interface to other data
  • Rich SQL language support
  • Scalability. If done right, PostgreSQL can scale incredibly well also for very large databases

So if anybody asks, what database to use, PostgreSQL is almost always one of the right answers.

Nevertheless, one has to admit, that operating PostgreSQL is clearly targeted to bare-metal installations. With the rise of kubernetes, PostgreSQL was always a little bit a misfit. A lot of folks just ran kubernetes alongside a bare-metal PostgreSQL instance. Alternatively, some very smart people created incredible systems for high available, auto-failover PostgreSQL on kubernetes. One of the best of these tools was zalando`s patroni or spilo.

Up until recently I successfully used spilo to run high available PostgreSQL instances which survived several catastrophic infrastructure events. However - even I as hardcore PostgreSQL fan have to admin, that operating spilo was kind of a pain in the ass. There are a lot of moving parts and manual intervention (eg. for updates) was hard.

CloudNativePG - the best PostgreSQL kubernetes operator so far

That's where CloudNativePG. CloudNativePG was developed and sponsored by the awesome people at EDB. CloudNativePG attempts to tackle this operational struggles of current postgres - k8s solutions. It provides a kubernetes operator that can bootstrap a high available PostgreSQL database cluster, using only kubernetes APIs and native streaming replication.

The clusters can run in private and public clouds and even hybrid-cloud environments.

As from their github repository

1The goal of CloudNativePG is to increase the adoption of PostgreSQL, one of the most loved DBMS in traditional VM and bare metal environments, inside Kubernetes, thus making the database an integral part of the development process and CI/CD automated pipelines.

Again, citing what they write on their github repository:

The operator is responsible for managing the status of the Cluster resource, keeping it up to date with the information that each PostgreSQL instance manager regularly reports back through the API server. Such changes might trigger, for example, actions like:

  • a PostgreSQL failover where, after an unexpected failure of a cluster's primary instance, the operator itself elects the new primary, updates the status, and directly coordinates the operation through the reconciliation loop, by relying on the instance managers

  • scaling up or down the number of read-only replicas, based on a positive or negative variation in the number of desired instances in the cluster, so that the operator creates or removes the required resources to run PostgreSQL, such as persistent volumes, persistent volume claims, pods, secrets, config maps, and then coordinates cloning and streaming replication tasks

  • updates of the endpoints of the PostgreSQL services that applications rely on to interact with the database, as Kubernetes represents the single source of truth and authority

  • updates of container images in a rolling fashion, following a change in the image name, by first updating the pods where replicas are running, and then the primary, issuing a switchover first

Furthermore, CloudNativePG provides resources for automatic backups and connection pooling.

In summary, the CloudNativePG operator provides the following levels of operations

k3s high available multi-server deployment with embedded storagek3s high available multi-server deployment with embedded storage

Install High available Postgres In Kubernetes

Now that we know what CloudNativePG is, let's start with a fresh install.

Prerequisites

For this guide to work, you'll need a running kubernetes cluster with at least 3 worker nodes and kubectl installed.

Finally, make sure that krewthe kubectl extension manager is installed, as described in the crew docs

1. Install the operator

  1. For convenience, the folks at EDB provided us with a kubectl plugin called cnpg. Install it with:

    1kubectl krew install cnpg
  2. Generate the operator yaml manifest. The -n flag defines the namespace where the operator is deployed to and the replicas flag tells us how many replicas of the operator should be installed (note: number of operator replicas - not postgres instances)

    1kubectl cnpg install generate -n devops-system --replicas 3 > operator-manifests.yaml

    This will create a file called operator-manifests.yaml.

  3. (Optional) The operator can be deployed using these manifests. However, the operator comes without any kubernetes tolerations. If you have separate control-plane and worker kubernetes nodes, you might want to add some tolerations to allow scheduling of the operator to the control-plane nodes. This is - however - highly optional. If you want to add some tolerations, open the file and add the tolerations to the operator deployment. Look for the following deployment (faaaar down in the file):

    1apiVersion: apps/v1
    2kind: Deployment
    3metadata:
    4creationTimestamp: null
    5labels:
    6 app.kubernetes.io/name: cloudnative-pg
    7name: cnpg-controller-manager
    8namespace: devops-system

    At the very end of the specification of the deployment, add the tolerations (see highlighted lines below):

    1 - name: cnpg-pull-secret
    2 securityContext:
    3 runAsNonRoot: true
    4 seccompProfile:
    5 type: RuntimeDefault
    6 serviceAccountName: cnpg-manager
    7 terminationGracePeriodSeconds: 10
    8 tolerations:
    9 - effect: NoSchedule
    10 key: node-role.kubernetes.io/master
    11 operator: Exists
    12 - effect: NoSchedule
    13 key: node-role.kubernetes.io/control-plane
    14 operator: Exists
    15 volumes:
    16 - emptyDir: {}
    17 name: scratch-data
    18 - name: webhook-certificates

    This example tolerations will allow the operator to be scheduled on the control-plane nodes - if they have the well-known taint node-role.kubernetes.io/control-plane:NoSchedule or node-role.kubernetes.io/master:NoSchedule.

  4. Deploy the manifests:

    1kubectl apply -f operator-manifests.yaml

Congratulations: You just deployed the CloudNativePG kubernetes operator.

To check if everything is alright, run kubectl get pods -n devops-system. The output should look similar to

1NAME READY STATUS RESTARTS AGE
2pod/cnpg-controller-manager-6448848cc9-25dwj 1/1 Running 0 3m16s
3pod/cnpg-controller-manager-6448848cc9-9l4wt 1/1 Running 0 3m16s
4pod/cnpg-controller-manager-6448848cc9-pzcjd 1/1 Running 0 3m16s

2. Create a PostgreSQL cluster

Now that our operator is installed, let's create a high available PostgreSQL cluster. We will also bootstrap an application database, add some database users and add some configuration options. All of that can be done in a single yaml-manifest - making setting up a cluster a delight.

The following section shows the manifest - I'll add comments where appropriate for more details.

For defining the secrets, keep in mind that kubernetes expects the secrets to be base64 encoded. On Linux, simply pass your secrets through base64 as follows:

1echo myrealpassword | base64

For the section about postgres settings, I used the toolpgtune to find settings appropriate for my nodes. Please navigate there and use the settings which best fit for your nodes.

1apiVersion: v1
2type: kubernetes.io/basic-auth
3kind: Secret
4data:
5 password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
6 username: YXBwX3VzZXI= #app_user
7metadata:
8 name: example-app-user
9 namespace: postgres-namespace
10---
11apiVersion: v1
12kind: Secret
13type: kubernetes.io/basic-auth
14data:
15 password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
16 username: cG9zdGdyZXM= #postgres - Note: It NEEDs to be postgres
17metadata:
18 name: example-superuser
19 namespace: postgres-namespace
20---
21apiVersion: postgresql.cnpg.io/v1
22kind: Cluster
23metadata:
24 name: example-cluster
25 namespace: postgres-namespace
26spec:
27 description: "DevOps and more cluster"
28 imageName: ghcr.io/cloudnative-pg/postgresql:15.1
29
30 # Number of instances. The operator automatically starts one master
31 # and two replicas.
32 instances: 3
33
34 # How many seconds to wait before the liveness probe starts
35 # Should be higher than the time needed for postgres to start
36 startDelay: 30
37 # When shutting down, for how long (in seconds) the operator should
38 # wait before executing a fast shutdown (terminating existing connections)
39 # The operator actually waits for half the time before executing fast shutdown.
40 # The other half is used to wait for finishing WAL archiving.
41 stopDelay: 100
42
43 # Example of rolling update strategy:
44 # - unsupervised: automated update of the primary once all
45 # replicas have been upgraded (default)
46 # - supervised: requires manual supervision to perform
47 # the switchover of the primary
48 primaryUpdateStrategy: unsupervised
49
50 # These are the settings of postgres. Use pgtune for determining them
51 postgresql:
52 parameters:
53 max_connections: 100
54 shared_buffers: 2560MB
55 effective_cache_size: 7680MB
56 maintenance_work_mem: 640MB
57 checkpoint_completion_target: 0.9
58 wal_buffers: 16MB
59 default_statistics_target: 100
60 random_page_cost: 1.1
61 effective_io_concurrency: 200
62 work_mem: 13107kB
63 min_wal_size: 1GB
64 max_wal_size: 4GB
65 max_worker_processes: 4
66 max_parallel_workers_per_gather: 2
67 max_parallel_workers: 4
68 max_parallel_maintenance_workers: 2
69
70 # These settings below automatically enable
71 # the pg_stat_statements and auto_explain extensions.
72 # No need, to add them to the shared_preload_libraries
73 pg_stat_statements.max: '10000'
74 pg_stat_statements.track: all
75 auto_explain.log_min_duration: '5s'
76
77 # pg_hba.conf file configuration. This line ensures, that, for
78 # all databases, all users can connect on the specified IP-Adress
79 # The scram-sha-256 method should be used for password verification
80 # https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
81 pg_hba:
82 - host all all 10.244.0.0/16 scram-sha-256
83
84 bootstrap:
85 initdb:
86 database: app
87 # name of the user who owns the database.
88 # There needs to be a secret for this user.
89 owner: app_user
90 secret:
91 name: example-app-user
92 # Alternative bootstrap method: start from a backup
93 #recovery:
94 # backup:
95 # name: backup-example
96
97 superuserSecret:
98 name: example-superuser
99
100 # Template for the PVC. Keep in mind that for storage
101 # which is bound directly to a node, the pod can consume as much
102 # storage until the disk space is full. The setting here is not a limit.
103 storage:
104 pvcTemplate:
105 accessModes:
106 - ReadWriteOnce
107 resources:
108 requests:
109 storage: 10Gi
110 volumeMode: Filesystem
111
112 resources:
113 requests:
114 memory: "1Gi"
115 cpu: "1"
116 limits:
117 memory: "10Gi"
118
119# # Optional setting: Only deploy on worker nodes
120# nodeSelector:
121# kubernetes.io/role: "worker"
122
123 # Optional setting: For nodes with locally attached storage,
124 # When using local storage for PostgreSQL, you are advised to temporarily put the cluster in maintenance mode through the nodeMaintenanceWindow option to avoid standard self-healing procedures to kick in, while, for example, enlarging the partition on the physical node or updating the node itself.
125 nodeMaintenanceWindow:
126 inProgress: false
127 # States if an existing PVC is reused or not during the maintenance operation.
128 # When enabled, Kubernetes waits for the node to come up again and then reuses the existing PVC. Suggested to keep on.
129 reusePVC: true

Apply the manifest by running kubectl apply -f <your-manifest-filename>.yaml

Check if everything works fine by running kubectl -n postgres-namespace get pods. The output should be similar to

1NAME READY STATUS RESTARTS AGE
2example-cluster-1 1/1 Running 0 25m
3example-cluster-2 1/1 Running 0 24m
4example-cluster-3 1/1 Running 0 19m

AWESOME: We now have a running postgresql cluster with 3 needs, high availability and auto-failover!

You might check on the cluster health, by running:

1kubectl cnpg status -n postgres-namespace example-cluster -v

This will output some nicely formatted information about the general PostgreSQL cluster state.

Additionally, the cnpg kubectl command provides some handy commands:

1 certificate Create a client certificate to connect to PostgreSQL using TLS and Certificate authentication
2 completion Generate the autocompletion script for the specified shell
3 destroy Destroy the instance named [CLUSTER_NAME] and [INSTANCE_ID] with the associated PVC
4 fencing Fencing related commands
5 hibernate Hibernation related commands
6 install CNPG installation commands
7 maintenance Sets or removes maintenance mode from clusters
8 pgbench Creates a pgbench job
9 promote Promote the pod named [cluster]-[node] or [node] to primary
10 reload Reload the cluster
11 report Report on the operator
12 restart Restart a cluster or a single instance in a cluster
13 status Get the status of a PostgreSQL cluster

How many databases per cluster?

In traditional PostgreSQL hosting, multiple databases are part of one database instance or cluster. However, my (and EDBs) suggestion for Kubernetes workloads in general is, to have a separate cluster per database - entirely utilized by a single microservice application.

As per definition of microservices, they should own and manage their data - in their specific database. O the microservice can access the database, including schema management and migrations.

(The following paragraph is directly from CloudNativePG) CloudNativePG has been designed to work this way out of the box, by default creating an application user and an application database owned by the aforementioned application user.

Reserving a PostgreSQL instance to a single microservice owned database, enhances:

  • resource management: in PostgreSQL, CPU, and memory constrained resources are generally handled at the instance level, not the database level, making it easier to integrate it with Kubernetes resource management policies at the pod level
  • physical continuous backup and Point-In-Time-Recovery (PITR): given that PostgreSQL handles continuous backup and recovery at the instance level, having one database per instance simplifies PITR operations, differentiates retention policy management, and increases data protection of backups
  • application updates: enable each application to decide their update policies without impacting other databases owned by different applications
  • database updates: each application can decide which PostgreSQL version to use, and independently, when to upgrade to a different major version of PostgreSQL and at what conditions (e.g., cutover time)

While I'd encourage you to keep to this pattern, it's very much possible to use the postgres superuser and create multiple schemas and databases per cluster.

Whats next?

Now that you have a highly available PostgreSQL cluster running, the heavy lifting is already done. Nevertheless, there are some imporant steps to continue.

  • Add Backups and WAL archiving: Docs
  • Learn how to use the cluster control plugin cnpg: Docs

Summary

This guide introduced CloudNativePG - a state-of-the-art kubernetes operator to manage a highly available PostgreSQL cluster, with auto-failover, backups etc. - all one needs for modern PostgreSQL operations.

Setting up a PostgreSQL cluster is done as follows:

  1. Install the cnpg plugin: kubectl krew install cnpg

  2. Generate the operator manifests: kubectl cnpg install generate -n devops-system --replicas 3 > operator-manifests.yaml

  3. Optional: If you want, adjust the operator manifests for node tolerations or node affinity

  4. Apply the operator: kubectl apply -f operator-manifests.yaml

  5. Create your cluster manifest file:

    1apiVersion: v1
    2type: kubernetes.io/basic-auth
    3kind: Secret
    4data:
    5 password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
    6 username: YXBwX3VzZXI= #app_user
    7metadata:
    8 name: example-app-user
    9 namespace: postgres-namespace
    10---
    11apiVersion: v1
    12kind: Secret
    13type: kubernetes.io/basic-auth
    14data:
    15 password: c2VjcmV0X3Bhc3N3b3Jk #secret_password
    16 username: cG9zdGdyZXM= #postgres - Note: It NEEDs to be postgres
    17metadata:
    18 name: example-superuser
    19 namespace: postgres-namespace
    20---
    21apiVersion: postgresql.cnpg.io/v1
    22kind: Cluster
    23metadata:
    24 name: example-cluster
    25 namespace: postgres-namespace
    26spec:
    27 description: "DevOps and more cluster"
    28 imageName: ghcr.io/cloudnative-pg/postgresql:15.1
    29 instances: 3
    30 startDelay: 30
    31 stopDelay: 100
    32 primaryUpdateStrategy: unsupervised
    33
    34 postgresql:
    35 parameters:
    36 max_connections: 100
    37 shared_buffers: 2560MB
    38 effective_cache_size: 7680MB
    39 maintenance_work_mem: 640MB
    40 checkpoint_completion_target: 0.9
    41 wal_buffers: 16MB
    42 default_statistics_target: 100
    43 random_page_cost: 1.1
    44 effective_io_concurrency: 200
    45 work_mem: 13107kB
    46 min_wal_size: 1GB
    47 max_wal_size: 4GB
    48 max_worker_processes: 4
    49 max_parallel_workers_per_gather: 2
    50 max_parallel_workers: 4
    51 max_parallel_maintenance_workers: 2
    52 pg_stat_statements.max: '10000'
    53 pg_stat_statements.track: all
    54 auto_explain.log_min_duration: '5s'
    55 pg_hba:
    56 - host all all 10.244.0.0/16 scram-sha-256
    57
    58 bootstrap:
    59 initdb:
    60 database: app
    61 owner: app_user
    62 secret:
    63 name: example-app-user
    64 # Alternative bootstrap method: start from a backup
    65 #recovery:
    66 # backup:
    67 # name: backup-example
    68
    69 superuserSecret:
    70 name: example-superuser
    71 storage:
    72 pvcTemplate:
    73 accessModes:
    74 - ReadWriteOnce
    75 resources:
    76 requests:
    77 storage: 10Gi
    78 volumeMode: Filesystem
    79
    80 resources:
    81 requests:
    82 memory: "1Gi"
    83 cpu: "1"
    84 limits:
    85 memory: "10Gi"
    86
    87 # # Optional setting: Only deploy on worker nodes
    88 # nodeSelector:
    89 # kubernetes.io/role: "worker"
    90
    91 nodeMaintenanceWindow:
    92 inProgress: false
    93 reusePVC: true
  6. Apply the manifest with kubectl apply -f <manifest-filename>.yaml

  7. Check your cluster status with kubectl cnpg status -n postgres-namespace example-cluster -v

------------------

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Need assistance?

Do you have any questions about the topic presented here? Or do you need someone to assist in implementing these areas? Do not hesitate to contact me.