Age Gap Is Not a Problem

MySQL × Cloud-Native

Sergey Pronin Founder, Solanica Inc. · Maintainer, OpenEverest

The Age Gap

MySQL turned 19 when Kubernetes was born

18 years
🎬 Toy Story premieres
1995 MySQL 1.0 MySQL
📖 Wikipedia launches
2001
📱 First iPhone announced
2007
2013 Docker
2014 Kubernetes Kubernetes
2018 Operator SDK
2019 Percona
Operator
for MySQL

The Problem

Two problems. Different solutions.

01

Philosophical

MySQL was built for permanent, named servers. Kubernetes assumes everything is temporary. The mental models don't overlap.

02

Technical

Replication, storage, failover, and connection routing all behave differently inside a cluster. The sharp edges are real.

Philosophical

MySQL grew up as a pet.
Kubernetes raises cattle.

🐕 Pet
  • db-primary.prod.local
  • Alert → SSH → investigate
  • The disk is the server
🐄🐄🐄 Cattle
  • mysql-abc-7f9d2
  • Crash → reschedule
  • State must be explicit

Part 2

Technical

Replication  ·  Storage  ·  Failover  ·  Networking

Problem

01

of many

High
Availability

When everything is allowed to fail — what happens to your database?

Uptime, then  and  now

Then

423 days

Maintenance windows. Graceful failover. Expected behavior.

Now

14:02:11evictpod terminated
14:02:34drainnode cordoned
14:02:48oomcontainer killed
14:03:07pvcvolume remount
14:03:22netpartition detected
14:03:55nodekernel reboot
14:04:18probeliveness failed
14:04:46scaleHPA scaled down
14:02:11evictpod terminated
14:02:34drainnode cordoned
14:02:48oomcontainer killed
14:03:07pvcvolume remount
14:03:22netpartition detected
14:03:55nodekernel reboot
14:04:18probeliveness failed
14:04:46scaleHPA scaled down

Anything. Anytime. By design.

Classic primary → replica

Async replication.
Failover = data loss.

Primary

mysql-0

accepts writes

T₁ T₂ T₃ T₄ T₅

binlog · async, with lag

Replica

mysql-1

always behind

In-flight transactions never arrive

Percona XtraDB Cluster · Galera

Every commit, on every node.

INSERT …

PXC

mysql-0

PXC

mysql-1

PXC

mysql-2

Lose any node. Zero data loss.

The catch

Writes wait for everyone.

Async
~1ms
Sync
~5ms

A few extra milliseconds for zero data loss.
For most workloads — worth it.