This website uses cookies to ensure you get the best experience. Read more in Privacy Policy.
OK
/whitepapers/❯postgresql-on-kubernetes
Running PostgreSQL on Kubernetes
A Complete Architecture Guide
PostgreSQL is the world's most advanced open-source relational database. Running it on Kubernetes is no longer a question of if — it's a question of how.
●STATUS: PRODUCTION_READY
●AUDIENCE: DBA | SRE | PLATFORM_ENG
●DATABASE: PostgreSQL 14+
$ cat TABLE_OF_CONTENTS
PostgreSQL on Kubernetes: Architecture & Operations
Six pillars of running production PostgreSQL on Kubernetes — from Patroni HA to operator selection
PostgreSQL has been in continuous development since 1986 and is widely regarded as the most advanced open-source relational database. Running it on Kubernetes is no longer experimental — it is a production-proven approach used by organizations that demand portability, automation, and freedom from vendor lock-in.
●STATUS: PRODUCTION_PROVEN
Is PostgreSQL on Kubernetes Production-Ready?
Yes — emphatically. CloudNativePG, the leading PostgreSQL operator, is a CNCF Sandbox project with thousands of production deployments. Patroni, the battle-tested high-availability layer used by operators like Zalando and Percona, has been running PostgreSQL clusters in production since 2015 at companies including Zalando itself, Gitlab, and Netflix.
Running PostgreSQL on Kubernetes delivers three core advantages that traditional VM-based deployments cannot match:
🔓
No Vendor Lock-In
Deploy on any cloud or on-prem infrastructure. The same PostgreSQL cluster runs on AWS, GCP, Azure, or bare metal — your data and configuration stay portable.
⚙️
Automated Operations
Operators handle failover, scaling, backups, and upgrades automatically. Patroni's DCS-based leader election makes failover deterministic and fast — typically under 30 seconds.
💰
Cost Efficiency
Consolidate workloads on shared clusters, right-size resources dynamically, and avoid paying the managed RDS/Cloud SQL premium — often saving 40–60% on TCO.
The debate is over. A vibrant Data on Kubernetes community, mature operators, and years of production evidence have proven that PostgreSQL on Kubernetes works at scale. The real question is which topology, storage strategy, and operator to choose.
●WHY_KUBERNETES
▸Run anywhere — any cloud, on-prem, edge
▸Automate Day-2 ops (backups, failover, scaling)
▸Declarative, version-controlled config
▸Consistent tooling across all databases
▸Save 40–60% vs managed cloud databases
▸CloudNativePG is a CNCF Sandbox project
$ kubectl describe topology pg-cluster
2. PostgreSQL on Kubernetes Architectures
A PostgreSQL deployment on Kubernetes has two critical layers: a connection pooling layer that manages the process-per-connection model, and a database layer with streaming replication and Patroni-driven high availability.
●LAYER: CONNECTION_POOLING
The Connection Pooling Layer (PgBouncer, pgpool-II)
Unlike MySQL which uses threads, PostgreSQL spawns a new OS process for every client connection. At scale, hundreds or thousands of direct connections consume gigabytes of RAM and create scheduler pressure on the database server. Connection pooling is not optional in production — it is essential.
PgBouncer is the standard solution: a lightweight, single-binary connection pooler that supports session, transaction, and statement pooling modes. Transaction-mode pooling (the most common) allows hundreds of application connections to multiplex over a small pool of real server connections, dramatically reducing PostgreSQL's resource consumption.
pgpool-II is a heavier alternative that additionally handles read/write splitting — routing SELECT queries to replicas and writes to the primary. It is more complex to configure but useful when read-scaling is needed without application-level changes.
PgBouncerPool, lightweight
pgpool-IIPool + read routing
PoolingRequired
ModeTransaction (typical)
●TOPOLOGY: ASYNC_STREAMING
Asynchronous Streaming Replication
The standard PostgreSQL replication model. The primary streams WAL (Write-Ahead Log) records to standby nodes in real time. Standbys apply WAL continuously, staying close behind the primary. On failure, Patroni promotes the most up-to-date standby automatically.
The trade-off is a small replication lag — if the primary fails before a standby catches up, a tiny amount of committed transactions may be lost. Semi-synchronous behavior can be approximated with synchronous_commit = remote_write for a middle ground.
Logical replication, a secondary topology, replicates at the row level and is valuable for zero-downtime major version upgrades and selective table replication — not a primary HA mechanism.
For zero data loss, PostgreSQL supports fully synchronous replication via synchronous_standby_names. The primary waits for the configured standby(s) to acknowledge the WAL record before reporting the transaction as committed to the client.
This guarantees that at least one standby always has all committed data. The trade-off is higher write latency: every write must wait for a round-trip to a remote node. Patroni handles failover seamlessly in sync mode, promoting only standbys that are fully caught up.
Zero Data Losssynchronous_standby_namesHigher Write Latency
●DIAGRAM: PG_K8S_TOPOLOGY
PostgreSQL Cluster Topology on Kubernetes
●HA: PATRONI_NODE_AFFINITY
High Availability with Patroni
Patroni uses a distributed configuration store (DCS — etcd, Consul, or ZooKeeper) for leader election and failover coordination. When the primary goes down, Patroni's DCS-based consensus promotes the most advanced standby in typically under 30 seconds.
CloudNativePG takes a different approach: it implements HA natively without Patroni, using Kubernetes-native leader election and a built-in failover controller. Both approaches produce the same result — automatic, deterministic promotion with zero manual intervention.
For a 3-node PostgreSQL cluster (1 primary + 2 standbys) with matching PgBouncer pods and a 3-node etcd ring, you need at least 3 Kubernetes nodes spread across 3 availability zones. Use topologySpreadConstraints to distribute database pods evenly and ensure the cluster — including the DCS quorum — survives a full zone failure.
Zone Redundancyetcd QuorumtopologySpreadConstraints
$ kubectl get qos --selector=app=postgres
3. Compute & Storage Strategy
Getting resources right is the difference between a rock-solid database and one that gets killed by the Linux OOM killer at 3 AM. For PostgreSQL, storage choices also directly affect WAL throughput and recovery time.
●CRITICAL: GUARANTEED_QOS
Guaranteed QoS (Requests = Limits)
For stateless apps, setting low requests and high limits is standard for bin-packing. For databases, this is a recipe for disaster. If your memory limit is higher than the request, the Linux OOM killer will target the database pod first when the node comes under memory pressure.
The recommended approach for PostgreSQL on Kubernetes: set requests equal to limits for both CPU and RAM. This gives the pod Guaranteed QoS class — the highest priority in the Kubernetes scheduler, immune to eviction during node pressure.
PostgreSQL's shared_buffers parameter controls how much memory it uses for caching data pages. The standard recommendation is 25% of available RAM. In a Kubernetes Guaranteed QoS pod, set the container memory limit high enough to accommodate shared_buffers plus working memory, autovacuum workers, and connection overhead.
WAL writes are sequential by nature — they land on the WAL segment in order. This makes WAL write performance heavily dependent on write latency rather than IOPS. For high-write workloads, placing the WAL directory (pg_wal) on a separate volume with low latency — ideally local NVMe — yields significant throughput improvements.
The standard approach is to decouple storage from compute using network-attached volumes like AWS EBS, GCP Persistent Disk, or Ceph. When a node fails, the pod is rescheduled on another node and the same PersistentVolume re-attaches with all data intact — no resynchronization required.
This makes pod migration simple and fast. The trade-off is I/O latency, which is higher than local disk because every read/write traverses the network. For most PostgreSQL workloads, provisioned IOPS network storage is the right default.
Easy MigrationAWS EBSCephGCP PD
●STORAGE: LOCAL_NVME
Local NVMe SSDs
For latency-sensitive workloads, local NVMe SSDs deliver the best raw performance — dramatically lower I/O latency and higher throughput than any network-attached option. This is particularly impactful for PostgreSQL WAL writes and random page reads.
The trade-off is durability: if the node fails, the local volume is lost. The pod is rescheduled on a different node and the standby must resynchronize the full dataset via streaming replication. Always ensure at least one standby uses network-attached storage and that pgBackRest backups are current.
NVMe SSDWAL PerformanceCost SavingsTrade-offs
●BEST_PRACTICE
▸Always set requests = limits for databases
▸Use Guaranteed QoS class
▸Set shared_buffers to ~25% of RAM
▸Consider separate WAL volume for write-heavy loads
▸Network storage for easy recovery
▸NVMe for latency-critical workloads
●QOS_CLASSES
Guaranteedreq == limit
Burstablereq < limit
BestEffortnone set
$ kubectl get servicemonitor --selector=app=postgres
4. Monitoring PostgreSQL on Kubernetes
Kubernetes changes how monitoring works. Pods are ephemeral and IPs change — your monitoring stack must dynamically discover targets. Prometheus Operator with postgres_exporter is the production standard for PostgreSQL on Kubernetes.
●STACK: PROMETHEUS_GRAFANA
Prometheus & Grafana
The standard monitoring stack for PostgreSQL on Kubernetes is Prometheus Operator + postgres_exporter + Grafana. The postgres_exporter connects to the PostgreSQL instance and exposes hundreds of metrics from pg_stat_* views as Prometheus metrics — replication lag, autovacuum activity, connection counts, cache hit ratios, and more.
Prometheus Operator introduces ServiceMonitor and PodMonitor custom resources that tell Prometheus which pods to scrape. This is essential in Kubernetes where pod IPs are dynamic — you never hardcode scrape targets.
Generic Kubernetes metrics (CPU, memory, network) tell you the pod is alive. PostgreSQL-specific metrics tell you whether the database is healthy. These are the five most critical to instrument first:
●
Replication Lag (bytes)
pg_replication_slots, pg_stat_replication — bytes behind primary. Alert if lag grows continuously.
●
Autovacuum Activity
pg_stat_user_tables — dead tuples accumulating without autovacuum can lead to table bloat and transaction ID wraparound. A silent killer.
●
Cache Hit Ratio
pg_stat_bgwriter — heap_blks_hit / (heap_blks_hit + heap_blks_read). Should be above 99%. Below 95% signals insufficient shared_buffers.
●
Connection Count vs max_connections
pg_stat_activity — how many connections are active, idle, idle-in-transaction. Alert before hitting max_connections.
●
Lock Waits & Deadlocks
pg_locks, pg_stat_database — deadlocks and long lock waits indicate application-level contention that should be investigated immediately.
Replication LagAutovacuumCache Hit RatioLock Waits
●GUIDE: KUBERNETES_AWARE_MONITORING
Adapting Monitoring for Kubernetes-Native PostgreSQL
●Use ServiceMonitor / PodMonitor — never hardcode Prometheus scrape targets
●Monitor Patroni health endpoint (/health) separately from database metrics
●Alert on Patroni leader changes — unexpected promotions indicate instability
●Track PVC capacity — PostgreSQL WAL can fill a volume faster than expected under write load
●Monitor PgBouncer pool saturation — cl_waiting counter indicates client connections queued for a server connection
Backups are the ultimate safety net. In cloud-native environments, the standard approach is S3-compatible object storage combined with continuous WAL archiving — enabling point-in-time recovery to any second in your retention window.
●STORAGE: S3_COMPATIBLE
S3-Compatible Backup Storage
The cloud-native standard for backup storage is any S3-compatible endpoint — AWS S3, Google Cloud Storage, Azure Blob (via S3 API), or self-hosted alternatives like MinIO. This gives you durability, versioning, and lifecycle management out of the box.
The strongly recommended practice is to store backups outside of the Kubernetes cluster — in a separate account, region, or at minimum a separate namespace. If the cluster is compromised or destroyed, your backups survive.
pgBackRest is the backup tool of choice for PostgreSQL on Kubernetes. It supports S3 natively, handles compression and encryption, and is used natively by CloudNativePG, Percona Operator, and Crunchy Data PGO. All major operators integrate with pgBackRest out of the box.
pgBackRestS3Off-ClusterVersioning
●CRITICAL: POINT_IN_TIME_RECOVERY
Point-in-Time Recovery (WAL Archiving)
A full base backup gives you a snapshot at a point in time. But real disasters require restoring to any arbitrary second — right before the bad DROP TABLE or data corruption event. This is point-in-time recovery (PITR), and PostgreSQL enables it natively via continuous WAL archiving.
PostgreSQL's WAL (Write-Ahead Log) records every database change sequentially. With pgBackRest configured to archive WAL segments continuously to S3, you can replay WAL on top of any base backup and stop at any moment — to the second. This is the most powerful disaster recovery primitive PostgreSQL offers.
PITRWAL ArchivingContinuousGranular Recovery
●ADVICE: BACKUP_ARCHITECTURE
Backup Architecture Decisions
●Recommended
Use pgBackRest with S3 in a separate cloud account or region. Enable WAL archiving for full PITR. Test restores regularly — an untested backup is not a backup.
●Acceptable (Dev/Staging)
Host S3-compatible storage (MinIO, RustFS, Ceph) in the same Kubernetes cluster. Convenient but carries shared-fate risk — cluster loss means backup loss.
●Not Recommended
Writing base backups to local PVCs. No durability guarantees, no versioning, no WAL archiving possible, and the same blast radius as the database itself.
$ kubectl get operators --selector=database=postgres
6. PostgreSQL Operators Compared
The Kubernetes ecosystem offers multiple mature operators for PostgreSQL — each with different HA strategies, backup integrations, and licensing models. Here's how they compare.
●
CloudNativePG
EDB / CNCF
LicenseApache 2.0
HA MethodBuilt-in (no Patroni required)
CNCF Sandbox project and the fastest-growing PostgreSQL operator. Implements HA natively using Kubernetes-native leader election — no external DCS (etcd/Consul) needed. Tight integration with pgBackRest for backups.
Battle-tested in production at Zalando for years. Uses Patroni with etcd for leader election and automatic failover. Well-documented and widely deployed, with a large community. Spilo is the underlying PostgreSQL image.
Fully open-source operator by Percona. Built on top of Patroni for HA and pgBackRest for backups with S3. Includes PgBouncer integration and Percona Monitoring and Management (PMM) support.
Full-stack PostgreSQL distribution bundling Patroni, PgBouncer, pgBackRest, and connection pooling into one operator. Strong focus on extensibility with a wide set of PostgreSQL extensions pre-packaged.
Enterprise-grade open-source operator with a strong emphasis on security (TLS everywhere, Pod Security, NetworkPolicy). Built around Patroni and pgBackRest. Used in OpenShift environments by government and financial sector deployments. ⚠ No longer actively developed.
Part of a broader multi-database operator suite (MySQL, MongoDB, Redis, etc.) with a unified management plane. Enterprise features require a license. Convenient if you need to manage many database types with a single operator.
OpenEverest is a CNCF Sandbox project that simplifies multi-database orchestration on Kubernetes. For PostgreSQL workloads, it currently uses Percona Operator for PostgreSQL as its engine — delivering Patroni-based high availability, pgBackRest PITR backups, and PgBouncer connection pooling through a single unified control plane.
OpenEverest is built on a modular operator architecture: the underlying engine is pluggable, and support for additional PostgreSQL operators — including CloudNativePG (CNPG) — is on the roadmap. The same modular approach applies across database technologies; MySQL and MongoDB are already supported, with more engines planned. This means you invest in one API and one operational model, regardless of which operator or database runs underneath.