This website uses cookies to ensure you get the best experience. Read more in Privacy Policy.

OK

Running MySQL on Kubernetes

A Complete Architecture Guide

MySQL is one of the most mature relational databases in the world. Running it on Kubernetes is no longer a question of if — it's a question of how.

● STATUS: PRODUCTION_READY

● AUDIENCE: DBA | SRE | PLATFORM_ENG

● DATABASE: MySQL 8.x+

$ cat TABLE_OF_CONTENTS

MySQL on Kubernetes: Architecture & Operations

Six pillars of running production MySQL on Kubernetes — from topology design to operator selection

01

● Introduction

Why MySQL on Kubernetes is production-ready and what makes it worth the move

Stateful Vendor Freedom Automation

→

02

● Topologies & Architecture

Proxy layers, replication topologies, node affinity, and multi-AZ deployments

HAProxy Galera Anti-Affinity

→

03

● Compute & Storage

Guaranteed QoS, NVMe vs network-attached storage, and performance trade-offs

QoS NVMe SSD EBS / Ceph

→

04

● Monitoring

Prometheus, database metrics, and adapting monitoring for dynamic Kubernetes environments

Prometheus Grafana DB Metrics

→

05

● Backups

S3-compatible storage, point-in-time recovery, and backup architecture choices

S3 PITR Off-Cluster

→

06

● MySQL Operators Compared

An overview of Kubernetes operators for MySQL — from Oracle to Vitess to OpenEverest

Operators OpenEverest Open Source

→

$ mysql --version

1. Introduction: MySQL Meets Kubernetes

MySQL was born in 1995 and has matured into one of the most widely deployed relational databases on the planet. Running it on Kubernetes is no longer experimental — it is a production-proven approach used by organizations that want portability, automation, and cost efficiency.

● STATUS: PRODUCTION_PROVEN

Is MySQL on Kubernetes Production-Ready?

The short answer: yes. The longer answer: multiple battle-tested Kubernetes operators now exist specifically for MySQL. Organizations like GitHub, Shopify, and Slack have demonstrated that stateful workloads — including relational databases — run reliably on Kubernetes when managed with the right tooling.

Running MySQL on Kubernetes delivers three core advantages that traditional VM-based deployments cannot match:

🔓

No Vendor Lock-In

Deploy on any cloud or on-prem infrastructure. The same MySQL deployment runs on AWS, GCP, Azure, or bare metal — your data stays portable.

⚙️

Automated Operations

Kubernetes operators handle failover, scaling, backups, and upgrades automatically. What used to be runbooks and pager alerts becomes declarative YAML.

💰

Cost Efficiency

Consolidate workloads on shared clusters, right-size resources dynamically, and avoid paying the cloud database premium — often saving 40–60% on TCO.

The debate is over. Dozens of mature operators, a vibrant Data on Kubernetes community, and years of production usage across thousands of clusters have proven that MySQL on Kubernetes works. The real question is which topology, storage strategy, and operator to choose.

$ kubectl describe topology mysql-cluster

2. MySQL on Kubernetes Architectures

A MySQL deployment on Kubernetes is best understood as two layers working together: a proxy layer that routes traffic and a database layer that handles replication and storage. Getting the topology right is the foundation of everything else.

● LAYER: PROXY

The Proxy Layer (HAProxy, ProxySQL)

Every production MySQL deployment on Kubernetes needs a proxy in front of the database nodes. The proxy handles connection routing, read/write splitting, and transparent failover so that application code never needs to know which pod is the current primary.

The two most common choices are HAProxy and ProxySQL. HAProxy is a fast, lightweight TCP/HTTP load balancer — simple to configure and excellent for basic read/write splitting. ProxySQL is MySQL-aware, offering query-level routing, connection multiplexing, and query caching. Both integrate well with Kubernetes operators.

The proxy pods must be deployed separately from the MySQL pods, ideally on their own nodes, to prevent a single node failure from taking down both the proxy and a database instance simultaneously.

HAProxy TCP LB, simple

ProxySQL SQL-aware routing

Isolation Required

● TOPOLOGY: ASYNC_REPLICATION

Asynchronous Replication

The standard MySQL replication model. The primary node writes to its binary log and replicas pull changes asynchronously. This provides good performance and is the most widely used topology. The trade-off is a small replication lag — if the primary fails before a replica catches up, some transactions may be lost.

Semi-synchronous replication improves on this by requiring at least one replica to acknowledge the transaction before the primary commits, reducing the data loss window significantly.

Primary-Replica Low Latency Semi-Sync Option

● TOPOLOGY: SYNC_REPLICATION

Synchronous Replication (Galera, Group Replication)

For workloads that require zero data loss, MySQL offers two synchronous replication options: Percona XtraDB Cluster (PXC) built on Galera, and MySQL Group Replication (InnoDB Cluster). Both ensure every committed transaction is written to all nodes before acknowledgment.

Galera uses certification-based replication and supports true multi-primary writes. Group Replication offers a similar model with tighter integration into the MySQL ecosystem. The trade-off is higher write latency due to cross-node coordination.

Zero Data Loss Galera / PXC Group Replication

● DIAGRAM: MYSQL_K8S_TOPOLOGY

MySQL Cluster Topology on Kubernetes

● HA: NODE_AFFINITY

High Availability with Node Affinity

For a 3-node MySQL cluster (1 primary + 2 replicas) with 3 proxy pods, you need at least 3 Kubernetes nodes. Each MySQL pod and each proxy pod must sit on a different physical node, enforced via podAntiAffinity. This ensures no single node failure takes down more than one database instance and one proxy simultaneously.

affinity:

    podAntiAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:

      - labelSelector:

          matchLabels:

            app: mysql

Node Isolation Required Policy

● RESILIENCE: MULTI_AZ

Multi-AZ Deployments

Cloud providers enable scheduling pods across different availability zones (AZs). This is critical — if an entire AZ goes down, your cluster maintains quorum in the remaining zones. Use topologySpreadConstraints to distribute MySQL pods evenly across AZs and ensure the cluster survives zone-level failures.

topologySpreadConstraints:

    - maxSkew: 1

      topologyKey: topology.kubernetes.io/zone

      whenUnsatisfiable: DoNotSchedule

Zone Redundancy Quorum Protection topologySpreadConstraints

$ kubectl get qos --selector=app=mysql

3. Compute & Storage Strategy

Getting resources right is the difference between a rock-solid database and one that gets killed by the Linux OOM killer at 3 AM. Storage choices directly impact your recovery time and cost.

● CRITICAL: GUARANTEED_QOS

Guaranteed QoS (Requests = Limits)

For stateless apps, setting low requests and high limits is standard for bin-packing. For databases, this is a recipe for disaster. If your memory limit is higher than the request, the Linux OOM killer will target the database pod first when the node comes under pressure.

The recommended approach for MySQL on Kubernetes: set requests equal to limits for both CPU and RAM. This gives the pod Guaranteed QoS class — the highest priority in the Kubernetes scheduler, immune to eviction during node pressure.

● BURSTABLE_QOS

resources:

    requests:

      memory: 2Gi

      cpu: 500m

    limits:

      memory: 8Gi

      cpu: 4000m

⚠️ OOM Kill Target Under Pressure

● GUARANTEED_QOS

resources:

    requests:

      memory: 8Gi

      cpu: 4000m

    limits:

      memory: 8Gi

      cpu: 4000m

✓ Protected by Kernel Priority

● STORAGE: NETWORK_ATTACHED

Network-Attached Storage (EBS, Ceph)

The standard and most common approach is to decouple storage from compute using network-attached volumes like AWS EBS, GCP Persistent Disk, or Ceph. This provides a major operational advantage: when a node fails, the pod is rescheduled on another node and the same PersistentVolume is re-attached with all data intact.

This makes pod migration seamless and fast — there is no data resynchronization needed. The trade-off is I/O latency, which is higher than local disk because every read/write traverses the network.

Easy Migration AWS EBS Ceph GCP PD

● STORAGE: LOCAL_NVME

Local NVMe SSDs

For latency-sensitive workloads, local NVMe SSDs offer the best raw performance — dramatically lower I/O latency and higher throughput than any network-attached option. This can also save significant cost since local storage is typically cheaper than provisioned IOPS network volumes.

The trade-off is durability: if the node fails, the local volume is lost. The database pod is rescheduled on a different node and must resynchronize the entire dataset from a replica. This takes time and may briefly impact performance on the source replica. Choose local NVMe when the performance gain outweighs the recovery cost, and always ensure you have at least one replica on network-attached storage or recent backups.

NVMe SSD Performance Cost Savings Trade-offs

$ kubectl get servicemonitor -l app=mysql

4. Monitoring MySQL on Kubernetes

Monitoring in Kubernetes is largely a solved problem at the infrastructure level — Prometheus Operator, Grafana dashboards, and alerting pipelines are mature and battle-tested. But database monitoring is more than CPU and memory graphs. You need to see inside the engine.

● INFRA: PROMETHEUS_STACK

Prometheus and Grafana

The standard Kubernetes monitoring stack — Prometheus Operator for metric collection and Grafana for visualization — integrates well with MySQL operators. Most operators deploy a mysqld_exporter sidecar alongside each MySQL pod that exposes internal engine metrics via a /metrics endpoint.

ServiceMonitor resources enable Prometheus to automatically discover and scrape new MySQL pods as they scale up or get rescheduled — adapting to Kubernetes' dynamic nature without manual configuration changes.

Prometheus Grafana ServiceMonitor Auto-Discovery

● CRITICAL: DATABASE_METRICS

Database Metrics (Not Just Infrastructure)

Watching node CPU and pod memory is necessary but insufficient. You need to monitor MySQL-specific metrics that reveal the real health of your database:

●

Replication Lag — Seconds_Behind_Master / group replication lag. Non-zero means your replicas are falling behind.

●

InnoDB Buffer Pool Hit Rate — If below 99%, your working set doesn't fit in memory. You're hitting disk.

●

Slow Queries — Count and rate of queries exceeding long_query_time. Trend matters more than absolute number.

●

Connection Count — Active connections vs max_connections. Running out causes hard application failures.

●

Threads Running — Concurrent query execution. Spikes indicate lock contention or resource bottlenecks.

mysqld_exporter Replication Lag InnoDB Slow Queries

● WARNING: DYNAMIC_TARGETS

Adapting Monitoring for Kubernetes

Traditional monitoring assumes static hosts with fixed IPs. Kubernetes breaks that assumption — pods get rescheduled, IPs change, replicas scale up and down. Make sure your monitoring setup:

✓ Uses label-based service discovery (not static IPs)

✓ Scrapes via ServiceMonitor or PodMonitor CRDs

✓ Alerts on replication lag, not just pod restarts

✓ Handles pod churn without losing metric continuity

✓ Includes both infrastructure and database-level dashboards

$ kubectl get backup -l engine=mysql

5. Backup Strategies

Backups are the ultimate safety net. In cloud-native environments, the standard approach is S3-compatible object storage — but where you store backups and whether you support point-in-time recovery matters enormously.

● STORAGE: S3_COMPATIBLE

S3-Compatible Backup Storage

The cloud-native standard for backup storage is any S3-compatible endpoint — AWS S3, Google Cloud Storage, Azure Blob (via S3 API), or self-hosted alternatives like MinIO. This gives you durability, versioning, and lifecycle management out of the box.

The recommended practice is to store backups outside of the Kubernetes cluster — in a separate account, region, or at minimum a separate namespace. If the cluster is compromised or destroyed, your backups survive.

That said, for development or staging environments it is sometimes acceptable to host backup storage within the same cluster using tools like MinIO alternatives, Ceph, or RustFS.

S3 Off-Cluster Durability Versioning

● CRITICAL: POINT_IN_TIME_RECOVERY

Point-in-Time Recovery (PITR)

A full backup gives you a snapshot at a point in time. But real disasters require restoring to any arbitrary second — right before the bad DELETE statement or data corruption event. This is point-in-time recovery (PITR), and it requires continuous binary log streaming to your backup storage.

PITR on Kubernetes with S3 is not trivial. Binary logs must be uploaded continuously, and the restore process must be able to replay them on top of the last full backup. Not all MySQL operators support PITR well — evaluate this capability carefully when choosing your solution.

PITR Binary Logs Continuous Granular Recovery

● ADVICE: BACKUP_ARCHITECTURE

Backup Architecture Decisions

● Recommended

Store backups in a separate cloud account or region. Use S3 lifecycle rules for retention. Test restores regularly — an untested backup is not a backup.

● Acceptable (Dev/Staging)

Host S3-compatible storage (MinIO, RustFS, Ceph) in the same Kubernetes cluster. Convenient but carries shared-fate risk — cluster loss means backup loss.

● Not Recommended

Writing backups to local PVCs or shared filesystem mounts. No durability guarantees, no versioning, and the same blast radius as the database itself.

$ kubectl get operators --selector=database=mysql

6. MySQL Operators Compared

The Kubernetes ecosystem offers multiple operators for MySQL — each with different HA strategies, licensing models, and trade-offs. Here's how they compare.

●

MySQL Operator

Oracle

License Community (limited) / Enterprise

HA Method InnoDB Cluster (Group Replication)

Official Oracle operator. Full features like advanced backup and data-at-rest encryption require an Enterprise license.

Group Replication Enterprise

↗ GitHub

●

Percona Operators

Percona

License Fully Open Source (Apache 2.0)

HA Method PXC (Galera sync) & PS (Async + Orchestrator)

Two fully open-source operators for different consistency needs. PXC provides synchronous Galera-based replication; Percona Server operator covers async/semi-sync with Orchestrator for failover.

Apache 2.0 Galera Async

↗ percona.com

●

Moco

Cybozu

License Open Source

HA Method GTID-based Semi-Sync Replication

Designed for high compatibility with standard MySQL 8, avoiding Group Replication's limitations with large transactions. Great for vanilla MySQL use cases.

Semi-Sync MySQL 8

↗ GitHub

●

Bitpoke MySQL Operator

Bitpoke

License Open Source (commercial support)

HA Method Orchestrator + ProxySQL

Built for WordPress at scale, then open-sourced. Focuses on robust backups, PITR, and operational flexibility. ⚠ No longer actively developed.

PITR ProxySQL Inactive

↗ GitHub

●

Vitess Operator

PlanetScale

License Open Source

HA Method Vitess Clustering (Sharding)

For horizontal scaling and sharding of MySQL at massive scale. Originally developed at YouTube. Adds complexity but solves the "MySQL doesn't shard" problem.

Sharding Horizontal Scale

↗ GitHub

●

KubeDB

AppsCode

License Proprietary (not open source)

HA Method Varies by configuration

Part of a broader multi-database operator suite (PostgreSQL, MongoDB, etc.) with a unified management plane. Enterprise features behind a license.

Multi-DB Enterprise

↗ kubedb.com

●

Tungsten Operator

Continuent

License Commercial Product

HA Method Tungsten Cluster (Advanced Replication)

Commercial enterprise solution for advanced cross-datacenter and cross-cloud replication scenarios. Proprietary.

Commercial Multi-DC

↗ Documentation

$ kubectl get everestoperator

OpenEverest: The Unified Approach

OpenEverest is a CNCF Sandbox project that simplifies multi-database orchestration on Kubernetes. Under the hood, it uses Percona Operator for MySQL based on PXC for MySQL workloads — giving you Galera-based synchronous replication with simplified management.

OpenEverest provides a single control plane for deploying and managing MySQL, PostgreSQL, and MongoDB clusters with consistent APIs, automated backups, monitoring integration, and zero vendor lock-in. Instead of managing each operator independently, OpenEverest orchestrates them through one unified interface.

CNCF Sandbox Open Source Multi-Database Percona PXC Unified API

$ kubectl apply -f mysql-cluster.yaml

Run MySQL on Kubernetes — The Right Way

Deploy production-grade MySQL on any Kubernetes cluster with automated failover, backups, and monitoring. Solanica Platform powered by OpenEverest gives you the tooling without the lock-in.

Book a Demo → Try OpenEverest ↗

100% Open Source

3+ DB Engines

40-60% Cost Savings

solanica@k8s:~$ █

Products

Solutions

COMPARISONS

Resources

Running MySQL on Kubernetes

A Complete Architecture Guide

MySQL on Kubernetes: Architecture & Operations

1. Introduction: MySQL Meets Kubernetes

Is MySQL on Kubernetes Production-Ready?

No Vendor Lock-In

Automated Operations

Cost Efficiency

2. MySQL on Kubernetes Architectures

The Proxy Layer (HAProxy, ProxySQL)

Asynchronous Replication

Synchronous Replication (Galera, Group Replication)

MySQL Cluster Topology on Kubernetes

High Availability with Node Affinity

Multi-AZ Deployments

3. Compute & Storage Strategy

Guaranteed QoS (Requests = Limits)

Network-Attached Storage (EBS, Ceph)

Local NVMe SSDs

4. Monitoring MySQL on Kubernetes

Prometheus and Grafana

Database Metrics (Not Just Infrastructure)

Adapting Monitoring for Kubernetes

5. Backup Strategies

S3-Compatible Backup Storage

Point-in-Time Recovery (PITR)

Backup Architecture Decisions

6. MySQL Operators Compared

MySQL Operator

Percona Operators

Moco

Bitpoke MySQL Operator

Vitess Operator

KubeDB

Tungsten Operator

OpenEverest: The Unified Approach

Explore More Whitepapers

The Cloud-Native Data Manual

Running MongoDB on Kubernetes

Running PostgreSQL on Kubernetes

Run MySQL on Kubernetes — The Right Way