It was the beginning of June 2025. I switched from IT administration to IT development with a clear task: solve an existing problem. That it would turn into this much of a struggle. I never saw that coming...
The Project begins
The starting point was simple: individual Docker hosts. The problem with that setup is obvious—if a host goes down, every container on it goes with it. As a result, these VMs tend to never get their OS upgraded. Nobody wants to touch what's running.
That pain point kicked off the search for something better. The answer seemed obvious: Kubernetes. If you look at LinkedIn and the usual tech circles, everyone's saying the same thing: Kubernetes solves all of these problems. But does it really? Is everything they're telling you actually true? Is it really just "up and running in a couple of hours"?
So I gave it a shot. The first question: which Kubernetes? Vanilla, K3s, RKE2? After a brief analysis, I went with vanilla Kubernetes. The reasoning was straightforward—it's plain Kubernetes. Unlike something like K3s, you're not dependent on the community's decisions about which components ship by default. You can swap out any piece you want. Sure, you can disable components in K3s too—but is that really the right approach? In my view, you should pick the distribution that ships with the feature set closest to what you actually want to use.
Our test environment ran on an existing virtualization platform. Since we didn't yet know how large the cluster would get, we had enough spare resources to spin up a handful of VMs. The first question: how many master and worker nodes do we actually need? After some research, the answer was clear—on a production cluster, you shouldn't run workloads on master nodes. So that means separate nodes. If you want the control plane to be highly available, you need at least two. And for leader election to work smoothly, it should be an odd number—so three. For worker nodes, I started with three. (Fun fact: by the end, it was five.) Then the next question followed naturally: how do I make the worker nodes highly available? Right—a load balancer. But the load balancer itself also needs to be HA, which means at least two. So the total came out to 8, eventually 10 servers. Not a problem in itself, but definitely more than you'd expect going in.
Setup a Cluster
From the start, I wanted fully automated provisioning. Adding nodes, updating nodes, scaling out—all of it should happen without manual work. For rolling out VMs on the hypervisor, I used Terraform with Sysprep to handle the bare minimum, so that Ansible could connect in the next step and do the actual configuration.
Once the cluster was up and running, the real journey began: discovering components and understanding how they fit together. The feature set of Kubernetes is essentially limitless. To name a few things I explored: host HA, auto-scaling, taints and tolerations, multi-cluster architecture, load balancers, deployments, replica sets, RBAC, cluster roles, secrets. After about two months of reading Kubernetes docs and watching tutorials, something became very clear: you can lose yourself in the depth incredibly fast.
That raised the real question: how deep do I actually need to go, and which components actually matter? So we tried a different approach—rapid integration. The goal was simple: get a working application deployed as quickly as possible.
But here's the thing. With the sheer number of options Kubernetes gives you, the complexity of the environment grows fast. Every choice opens three more choices. And that's the core tension—Kubernetes gives you everything, but "everything" isn't always what you need.
So I tried to keep it as lean as possible:
- Traefik as the ingress controller
- Secrets stored natively in Kubernetes
- CI by GitLab
- CD handled through ArgoCD
Sounds simple... until the first sync conflicts hit. At that point, I probably ran argocd app delete more often than argocd app sync.
But is that really what you want long-term? Over time, I got ArgoCD under control. The application ran stable, new releases rolled out clean. But did I trust this enough to take it to production? No.
One step back
So I asked myself: what value does Kubernetes actually give me—and what are the real reasons against Docker? The starting point was simple: individual Docker hosts. The problem with that setup is obvious: If a host goes down, every container on it goes with it. As a result, these VMs rarely get their OS upgraded. Nobody wants to touch what's running.
Aber kann man das nicht