Kubernetes became the default answer to container orchestration. Job postings require it, reference architectures assume it, conference talks revolve around it. If you're not running K8s, it feels like you're doing something wrong. But running it in production looks nothing like the minikube tutorials suggest.
I've operated K8s clusters in enterprise environments where operational complexity far exceeded the value the platform delivered. I've also seen deployments where Kubernetes was exactly the right call. The difference between those scenarios isn't technical. It's contextual.
When NOT to use Kubernetes
This is the question nobody asks during design phase and everyone asks six months later when the cluster needs a dedicated team just to stay alive.
- Your team is under 5 people. K8s demands specialized knowledge in networking, storage, RBAC, upgrades and debugging. If your team already struggles to ship features, adding cluster operations is a mistake.
- Your app is a simple monolith. A well-deployed monolith on a couple of VMs behind a load balancer is easier to operate, cheaper and more predictable than wrapping it in K8s because that's what everyone does.
- Operational overhead exceeds the value. If you spend more time debugging the cluster than building product, the tool stopped being a tool and became the project.
ECS, Cloud Run, Nomad or even docker-compose on dedicated machines can solve the problem with a fraction of the complexity. The right question isn't "how do I use Kubernetes" but "what problem am I solving and what's the simplest way to solve it."
Decisions the docs don't cover
Resource requests vs. limits
The docs explain the difference. What they don't tell you is how to set the right values. I've seen two extremes, both equally harmful:
- No requests or limits defined: pods compete for resources without guardrails. One hungry pod can take down the entire node.
- Limits set equal to requests: sounds conservative, but it kills bursting. For workloads with temporary spikes, this creates unnecessary throttling and degrades user experience.
What actually works: requests based on real p95 consumption (not averages), limits at 1.5x-2x the request, and continuous monitoring to adjust. VPA (Vertical Pod Autoscaler) in recommendation mode gives you a solid starting point.
Networking: service mesh, ingress and internal DNS
Installing Istio because "it's the right way to do service mesh" without a problem that justifies it is a recipe for pain. A service mesh adds a sidecar proxy to every pod, consumes resources, increases latency and makes debugging harder.
Before Istio, ask yourself: Do I need mTLS between services? Do I need advanced traffic splitting? Do I have more than 20 services communicating with each other? If the answer is no to all three, a standard ingress controller (nginx or Traefik) with Kubernetes network policies is enough.
A problem that surfaces late and hurts a lot: internal DNS. CoreDNS works fine until it doesn't. When hundreds of pods are making constant resolution requests, CoreDNS throughput becomes an invisible bottleneck. Configure ndots correctly in your pods and use FQDNs where possible. That one-line change in dnsConfig can cut DNS queries by 80%.
Storage: the pain of stateful workloads
Kubernetes was designed for stateless workloads. That doesn't mean it can't handle state, but every time you deploy a StatefulSet with PersistentVolumes you're swimming upstream.
Databases on K8s are possible. Recommended? It depends. If you have a team experienced with operators like CloudNativePG or Vitess and a solid storage backend (not the cloud provider's default), go ahead. If not, use a managed service. RDS, Cloud SQL or any DBaaS will be more stable than your Postgres on a StatefulSet managed by a team that doesn't know what to do when a PV gets stuck in Released state.
Observability in an ephemeral world
Pods die and respawn constantly. Logs from a pod that was evicted 5 minutes ago no longer exist if you're not shipping them somewhere. This is a basic problem that catches many teams off guard during their first serious incident.
Minimum viable setup: Fluentd or Fluent Bit as a DaemonSet shipping logs to an aggregator (Loki, Elasticsearch). Prometheus with adequate retention for metrics. Distributed traces if you have more than 3 services. Without this, you're operating blind.
Production debugging
kubectl exec is the first tool you reach for, and the last you should rely on as a solution. It's fine for spot checks: verifying environment variables, testing network connectivity, inspecting the filesystem. It's not a debugging strategy.
What works better: ephemeral containers (GA since K8s 1.25) let you attach a debug container to a running pod without modifying its spec. Combine them with crictl for runtime inspection and kubectl debug node for node-level issues.
But the most effective debugging is the kind you never need to do. If your observability is solid, most problems get diagnosed from metrics and logs without touching the cluster.
K8s in the enterprise: RBAC and multi-tenancy
In enterprise environments, the cluster is shared across teams. This is where design decisions become political in addition to technical.
Namespaces as boundaries: they work as logical separation, not real isolation. A namespace won't prevent a pod from consuming all resources on a node. For that you need ResourceQuotas and LimitRanges per namespace, and real enforcement -- not just YAML definitions that nobody reviews.
Real vs. fictional multi-tenancy: if you need real tenant isolation (for compliance, security or simply trust), namespaces aren't enough. You need separate clusters or solutions like vCluster that create virtual clusters inside a physical one. Soft multi-tenancy via namespaces works for teams within the same organization that trust each other. For everything else, it's an illusion.
RBAC looks simple until you have 15 teams with different needs. My advice: define roles at the namespace level (not cluster roles), use AD/LDAP groups instead of individual users, and audit permissions quarterly. Permissions accumulate. Nobody asks to have access removed.
Kubernetes is a tool. Not an architecture. Don't adopt K8s because it's what everyone uses. Adopt it when the problem it solves is bigger than the complexity it introduces. And when you do, invest in your team before you invest in the cluster.