This website uses cookies to ensure you get the best experience. Read more in Privacy Policy.
OK
/whitepapers/ postgresql-on-kubernetes

Running PostgreSQL on Kubernetes

A Complete Architecture Guide

PostgreSQL is the world's most advanced open-source relational database. Running it on Kubernetes is no longer a question of if — it's a question of how.

1. Introduction: PostgreSQL Meets Kubernetes

PostgreSQL has been in continuous development since 1986 and is widely regarded as the most advanced open-source relational database. Running it on Kubernetes is no longer experimental — it is a production-proven approach used by organizations that demand portability, automation, and freedom from vendor lock-in.

STATUS: PRODUCTION_PROVEN

Is PostgreSQL on Kubernetes Production-Ready?

Yes — emphatically. CloudNativePG, the leading PostgreSQL operator, is a CNCF Sandbox project with thousands of production deployments. Patroni, the battle-tested high-availability layer used by operators like Zalando and Percona, has been running PostgreSQL clusters in production since 2015 at companies including Zalando itself, Gitlab, and Netflix.

Running PostgreSQL on Kubernetes delivers three core advantages that traditional VM-based deployments cannot match:

🔓

No Vendor Lock-In

Deploy on any cloud or on-prem infrastructure. The same PostgreSQL cluster runs on AWS, GCP, Azure, or bare metal — your data and configuration stay portable.

⚙️

Automated Operations

Operators handle failover, scaling, backups, and upgrades automatically. Patroni's DCS-based leader election makes failover deterministic and fast — typically under 30 seconds.

💰

Cost Efficiency

Consolidate workloads on shared clusters, right-size resources dynamically, and avoid paying the managed RDS/Cloud SQL premium — often saving 40–60% on TCO.

The debate is over. A vibrant Data on Kubernetes community, mature operators, and years of production evidence have proven that PostgreSQL on Kubernetes works at scale. The real question is which topology, storage strategy, and operator to choose.

WHY_KUBERNETES
Run anywhere — any cloud, on-prem, edge
Automate Day-2 ops (backups, failover, scaling)
Declarative, version-controlled config
Consistent tooling across all databases
Save 40–60% vs managed cloud databases
CloudNativePG is a CNCF Sandbox project

2. PostgreSQL on Kubernetes Architectures

A PostgreSQL deployment on Kubernetes has two critical layers: a connection pooling layer that manages the process-per-connection model, and a database layer with streaming replication and Patroni-driven high availability.

LAYER: CONNECTION_POOLING

The Connection Pooling Layer (PgBouncer, pgpool-II)

Unlike MySQL which uses threads, PostgreSQL spawns a new OS process for every client connection. At scale, hundreds or thousands of direct connections consume gigabytes of RAM and create scheduler pressure on the database server. Connection pooling is not optional in production — it is essential.

PgBouncer is the standard solution: a lightweight, single-binary connection pooler that supports session, transaction, and statement pooling modes. Transaction-mode pooling (the most common) allows hundreds of application connections to multiplex over a small pool of real server connections, dramatically reducing PostgreSQL's resource consumption.

pgpool-II is a heavier alternative that additionally handles read/write splitting — routing SELECT queries to replicas and writes to the primary. It is more complex to configure but useful when read-scaling is needed without application-level changes.

PgBouncer Pool, lightweight
pgpool-II Pool + read routing
Pooling Required
Mode Transaction (typical)
TOPOLOGY: ASYNC_STREAMING

Asynchronous Streaming Replication

The standard PostgreSQL replication model. The primary streams WAL (Write-Ahead Log) records to standby nodes in real time. Standbys apply WAL continuously, staying close behind the primary. On failure, Patroni promotes the most up-to-date standby automatically.

The trade-off is a small replication lag — if the primary fails before a standby catches up, a tiny amount of committed transactions may be lost. Semi-synchronous behavior can be approximated with synchronous_commit = remote_write for a middle ground.

Logical replication, a secondary topology, replicates at the row level and is valuable for zero-downtime major version upgrades and selective table replication — not a primary HA mechanism.

WAL Streaming Patroni Failover Low Latency Logical Repl
TOPOLOGY: SYNC_STREAMING

Synchronous Streaming Replication

For zero data loss, PostgreSQL supports fully synchronous replication via synchronous_standby_names. The primary waits for the configured standby(s) to acknowledge the WAL record before reporting the transaction as committed to the client.

This guarantees that at least one standby always has all committed data. The trade-off is higher write latency: every write must wait for a round-trip to a remote node. Patroni handles failover seamlessly in sync mode, promoting only standbys that are fully caught up.

Zero Data Loss synchronous_standby_names Higher Write Latency
DIAGRAM: PG_K8S_TOPOLOGY

PostgreSQL Cluster Topology on Kubernetes

KUBERNETES CLUSTER 3 Nodes × 3 AZs — Patroni HA node-1 (AZ-a) PgBouncer pgbouncer-0 PG Primary ✓ READ/WRITE postgres-0 etcd-0 Leader Election PVC: pg-data-0 StorageClass: fast-ssd podAntiAffinity: strict node-2 (AZ-b) PgBouncer pgbouncer-1 PG Standby ◎ READ ONLY postgres-1 etcd-1 DCS Quorum PVC: pg-data-1 StorageClass: fast-ssd podAntiAffinity: strict node-3 (AZ-c) PgBouncer pgbouncer-2 PG Standby ◎ READ ONLY postgres-2 etcd-2 DCS Quorum PVC: pg-data-2 StorageClass: fast-ssd podAntiAffinity: strict WAL Availability Zone A Availability Zone B Availability Zone C
HA: PATRONI_NODE_AFFINITY

High Availability with Patroni

Patroni uses a distributed configuration store (DCS — etcd, Consul, or ZooKeeper) for leader election and failover coordination. When the primary goes down, Patroni's DCS-based consensus promotes the most advanced standby in typically under 30 seconds.

CloudNativePG takes a different approach: it implements HA natively without Patroni, using Kubernetes-native leader election and a built-in failover controller. Both approaches produce the same result — automatic, deterministic promotion with zero manual intervention.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: postgres
Patroni + etcd Auto Failover <30s Promotion
RESILIENCE: MULTI_AZ

Multi-AZ Deployments

For a 3-node PostgreSQL cluster (1 primary + 2 standbys) with matching PgBouncer pods and a 3-node etcd ring, you need at least 3 Kubernetes nodes spread across 3 availability zones. Use topologySpreadConstraints to distribute database pods evenly and ensure the cluster — including the DCS quorum — survives a full zone failure.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
Zone Redundancy etcd Quorum topologySpreadConstraints

3. Compute & Storage Strategy

Getting resources right is the difference between a rock-solid database and one that gets killed by the Linux OOM killer at 3 AM. For PostgreSQL, storage choices also directly affect WAL throughput and recovery time.

CRITICAL: GUARANTEED_QOS

Guaranteed QoS (Requests = Limits)

For stateless apps, setting low requests and high limits is standard for bin-packing. For databases, this is a recipe for disaster. If your memory limit is higher than the request, the Linux OOM killer will target the database pod first when the node comes under memory pressure.

The recommended approach for PostgreSQL on Kubernetes: set requests equal to limits for both CPU and RAM. This gives the pod Guaranteed QoS class — the highest priority in the Kubernetes scheduler, immune to eviction during node pressure.

BURSTABLE_QOS
resources:
  requests:
    memory: 2Gi
    cpu: 500m
  limits:
    memory: 8Gi
    cpu: 4000m

⚠️ OOM Kill Target Under Pressure

GUARANTEED_QOS
resources:
  requests:
    memory: 8Gi
    cpu: 4000m
  limits:
    memory: 8Gi
    cpu: 4000m

✓ Protected by Kernel Priority

CONFIG: SHARED_BUFFERS

shared_buffers and WAL Tuning

PostgreSQL's shared_buffers parameter controls how much memory it uses for caching data pages. The standard recommendation is 25% of available RAM. In a Kubernetes Guaranteed QoS pod, set the container memory limit high enough to accommodate shared_buffers plus working memory, autovacuum workers, and connection overhead.

WAL writes are sequential by nature — they land on the WAL segment in order. This makes WAL write performance heavily dependent on write latency rather than IOPS. For high-write workloads, placing the WAL directory (pg_wal) on a separate volume with low latency — ideally local NVMe — yields significant throughput improvements.

shared_buffers = 25% RAM WAL Sequential Writes Separate WAL Volume
STORAGE: NETWORK_ATTACHED

Network-Attached Storage (EBS, Ceph)

The standard approach is to decouple storage from compute using network-attached volumes like AWS EBS, GCP Persistent Disk, or Ceph. When a node fails, the pod is rescheduled on another node and the same PersistentVolume re-attaches with all data intact — no resynchronization required.

This makes pod migration simple and fast. The trade-off is I/O latency, which is higher than local disk because every read/write traverses the network. For most PostgreSQL workloads, provisioned IOPS network storage is the right default.

Easy Migration AWS EBS Ceph GCP PD
STORAGE: LOCAL_NVME

Local NVMe SSDs

For latency-sensitive workloads, local NVMe SSDs deliver the best raw performance — dramatically lower I/O latency and higher throughput than any network-attached option. This is particularly impactful for PostgreSQL WAL writes and random page reads.

The trade-off is durability: if the node fails, the local volume is lost. The pod is rescheduled on a different node and the standby must resynchronize the full dataset via streaming replication. Always ensure at least one standby uses network-attached storage and that pgBackRest backups are current.

STORAGE COMPARISON: NETWORK vs LOCAL NVMe Network-Attached (EBS) Latency: ~1ms Throughput: Medium ✓ Pod migration: instant reattach ✓ Data survives node failure ⚠ Higher WAL write latency ⚠ IOPS provisioning cost Best for: general workloads, simplicity Local NVMe SSD Latency: ~0.1ms Throughput: High ✓ Lowest WAL write latency ✓ Better cost per IOPS ✕ Data lost on node failure ✕ Full resync on recovery Best for: write-heavy, latency-sensitive
NVMe SSD WAL Performance Cost Savings Trade-offs
BEST_PRACTICE
Always set requests = limits for databases
Use Guaranteed QoS class
Set shared_buffers to ~25% of RAM
Consider separate WAL volume for write-heavy loads
Network storage for easy recovery
NVMe for latency-critical workloads
QOS_CLASSES
Guaranteed req == limit
Burstable req < limit
BestEffort none set

4. Monitoring PostgreSQL on Kubernetes

Kubernetes changes how monitoring works. Pods are ephemeral and IPs change — your monitoring stack must dynamically discover targets. Prometheus Operator with postgres_exporter is the production standard for PostgreSQL on Kubernetes.

STACK: PROMETHEUS_GRAFANA

Prometheus & Grafana

The standard monitoring stack for PostgreSQL on Kubernetes is Prometheus Operator + postgres_exporter + Grafana. The postgres_exporter connects to the PostgreSQL instance and exposes hundreds of metrics from pg_stat_* views as Prometheus metrics — replication lag, autovacuum activity, connection counts, cache hit ratios, and more.

Prometheus Operator introduces ServiceMonitor and PodMonitor custom resources that tell Prometheus which pods to scrape. This is essential in Kubernetes where pod IPs are dynamic — you never hardcode scrape targets.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
  selector:
    matchLabels:
      app: postgres
  endpoints:
  - port: metrics
    path: /metrics
Prometheus Operator postgres_exporter ServiceMonitor
METRICS: DATABASE_SPECIFIC

PostgreSQL-Specific Metrics

Generic Kubernetes metrics (CPU, memory, network) tell you the pod is alive. PostgreSQL-specific metrics tell you whether the database is healthy. These are the five most critical to instrument first:

Replication Lag (bytes)

pg_replication_slots, pg_stat_replication — bytes behind primary. Alert if lag grows continuously.

Autovacuum Activity

pg_stat_user_tables — dead tuples accumulating without autovacuum can lead to table bloat and transaction ID wraparound. A silent killer.

Cache Hit Ratio

pg_stat_bgwriter — heap_blks_hit / (heap_blks_hit + heap_blks_read). Should be above 99%. Below 95% signals insufficient shared_buffers.

Connection Count vs max_connections

pg_stat_activity — how many connections are active, idle, idle-in-transaction. Alert before hitting max_connections.

Lock Waits & Deadlocks

pg_locks, pg_stat_database — deadlocks and long lock waits indicate application-level contention that should be investigated immediately.

Replication Lag Autovacuum Cache Hit Ratio Lock Waits
GUIDE: KUBERNETES_AWARE_MONITORING

Adapting Monitoring for Kubernetes-Native PostgreSQL

Use ServiceMonitor / PodMonitor — never hardcode Prometheus scrape targets
Monitor Patroni health endpoint (/health) separately from database metrics
Alert on Patroni leader changes — unexpected promotions indicate instability
Track PVC capacity — PostgreSQL WAL can fill a volume faster than expected under write load
Monitor PgBouncer pool saturation — cl_waiting counter indicates client connections queued for a server connection
Use Grafana dashboards: PostgreSQL Database dashboard (9628) is the community standard starting point

5. Backup Strategies

Backups are the ultimate safety net. In cloud-native environments, the standard approach is S3-compatible object storage combined with continuous WAL archiving — enabling point-in-time recovery to any second in your retention window.

STORAGE: S3_COMPATIBLE

S3-Compatible Backup Storage

The cloud-native standard for backup storage is any S3-compatible endpoint — AWS S3, Google Cloud Storage, Azure Blob (via S3 API), or self-hosted alternatives like MinIO. This gives you durability, versioning, and lifecycle management out of the box.

The strongly recommended practice is to store backups outside of the Kubernetes cluster — in a separate account, region, or at minimum a separate namespace. If the cluster is compromised or destroyed, your backups survive.

pgBackRest is the backup tool of choice for PostgreSQL on Kubernetes. It supports S3 natively, handles compression and encryption, and is used natively by CloudNativePG, Percona Operator, and Crunchy Data PGO. All major operators integrate with pgBackRest out of the box.

pgBackRest S3 Off-Cluster Versioning
CRITICAL: POINT_IN_TIME_RECOVERY

Point-in-Time Recovery (WAL Archiving)

A full base backup gives you a snapshot at a point in time. But real disasters require restoring to any arbitrary second — right before the bad DROP TABLE or data corruption event. This is point-in-time recovery (PITR), and PostgreSQL enables it natively via continuous WAL archiving.

PostgreSQL's WAL (Write-Ahead Log) records every database change sequentially. With pgBackRest configured to archive WAL segments continuously to S3, you can replay WAL on top of any base backup and stop at any moment — to the second. This is the most powerful disaster recovery primitive PostgreSQL offers.

BACKUP & PITR TIMELINE (pgBackRest) BASE BACKUP Mon 00:00 WAL archive → S3 (pgBackRest) ☠ DROP TABLE Wed 14:32 ← RESTORE HERE Wed 14:31 BASE BACKUP Thu 00:00
PITR WAL Archiving Continuous Granular Recovery
ADVICE: BACKUP_ARCHITECTURE

Backup Architecture Decisions

Recommended

Use pgBackRest with S3 in a separate cloud account or region. Enable WAL archiving for full PITR. Test restores regularly — an untested backup is not a backup.

Acceptable (Dev/Staging)

Host S3-compatible storage (MinIO, RustFS, Ceph) in the same Kubernetes cluster. Convenient but carries shared-fate risk — cluster loss means backup loss.

Not Recommended

Writing base backups to local PVCs. No durability guarantees, no versioning, no WAL archiving possible, and the same blast radius as the database itself.

6. PostgreSQL Operators Compared

The Kubernetes ecosystem offers multiple mature operators for PostgreSQL — each with different HA strategies, backup integrations, and licensing models. Here's how they compare.

CloudNativePG

EDB / CNCF
License Apache 2.0
HA Method Built-in (no Patroni required)

CNCF Sandbox project and the fastest-growing PostgreSQL operator. Implements HA natively using Kubernetes-native leader election — no external DCS (etcd/Consul) needed. Tight integration with pgBackRest for backups.

CNCF Sandbox K8s-Native HA pgBackRest
↗ GitHub

Zalando Postgres Operator

Zalando
License MIT
HA Method Patroni + etcd

Battle-tested in production at Zalando for years. Uses Patroni with etcd for leader election and automatic failover. Well-documented and widely deployed, with a large community. Spilo is the underlying PostgreSQL image.

MIT Patroni etcd
↗ GitHub

Percona Operator for PostgreSQL

Percona
License Apache 2.0
HA Method Patroni + pgBackRest

Fully open-source operator by Percona. Built on top of Patroni for HA and pgBackRest for backups with S3. Includes PgBouncer integration and Percona Monitoring and Management (PMM) support.

Apache 2.0 Patroni pgBackRest
↗ percona.com

StackGres

OnGres
License Apache 2.0 (enterprise add-ons)
HA Method Patroni + bundled PgBouncer

Full-stack PostgreSQL distribution bundling Patroni, PgBouncer, pgBackRest, and connection pooling into one operator. Strong focus on extensibility with a wide set of PostgreSQL extensions pre-packaged.

Bundled Stack PgBouncer Extensions
↗ stackgres.io

Crunchy Data PGO

CrunchyData
License Apache 2.0
HA Method Patroni + pgBackRest

Enterprise-grade open-source operator with a strong emphasis on security (TLS everywhere, Pod Security, NetworkPolicy). Built around Patroni and pgBackRest. Used in OpenShift environments by government and financial sector deployments. ⚠ No longer actively developed.

Apache 2.0 Patroni Security-first Inactive
↗ GitHub

KubeDB

AppsCode
License Proprietary (not open source)
HA Method Configurable

Part of a broader multi-database operator suite (MySQL, MongoDB, Redis, etc.) with a unified management plane. Enterprise features require a license. Convenient if you need to manage many database types with a single operator.

Multi-DB Enterprise
↗ kubedb.com

OpenEverest: The Unified Approach

OpenEverest is a CNCF Sandbox project that simplifies multi-database orchestration on Kubernetes. For PostgreSQL workloads, it currently uses Percona Operator for PostgreSQL as its engine — delivering Patroni-based high availability, pgBackRest PITR backups, and PgBouncer connection pooling through a single unified control plane.

OpenEverest is built on a modular operator architecture: the underlying engine is pluggable, and support for additional PostgreSQL operators — including CloudNativePG (CNPG) — is on the roadmap. The same modular approach applies across database technologies; MySQL and MongoDB are already supported, with more engines planned. This means you invest in one API and one operational model, regardless of which operator or database runs underneath.

CNCF Sandbox Open Source Modular Architecture Percona Operator Now CloudNativePG Planned Multi-Database
solanica@k8s:~$

Run PostgreSQL on Kubernetes — The Right Way

Patroni HA, pgBackRest PITR, PgBouncer connection pooling — all managed through a single open-source control plane. No vendor lock-in. No surprises.

100% Open Source
3+ DB Engines
40–60% Cost Savings