βΆDocker vs containerd vs Podman β which container runtime should I use?
Docker is the industry standard, teaches the concepts well, and works everywhere (desktop + server + cloud). Docker wraps containerd under the hood. Containerd is lighter (no daemon) and used by Kubernetes natively. Podman is a drop-in Docker replacement with no daemon and better rootless support β best for security-conscious teams. For learning: Docker. For production: depends on ops preference, but K8s abstracts the runtime away, so the choice mostly matters for tooling/ease-of-use.
βΆWhen is Kubernetes the right choice versus Docker Compose or serverless?
Kubernetes is overkill for single-machine workloads β use Docker Compose locally or Swarm for basic multi-host setups. But K8s wins when you need: self-healing, automated scaling, rolling updates, multi-cloud portability, service mesh, or managing 100+ containers. If you have sustained high load, high availability requirements, or microservices: K8s. If you have simple batch jobs or bursty traffic: Lambda/serverless. Docker Compose for local dev and single-host staging only.
βΆHow do Kubernetes networking and CNI plugins work?
Each pod gets its own IP on a flat network overlay. The CNI (Container Network Interface) plugin handles the networking: Calico enforces network policies (like firewall rules), Flannel is lightweight (vxlan/host-gw), Cilium adds eBPF magic for observability + policies. By default, every pod can talk to every other pod; you restrict with NetworkPolicy (layer 3/4 rules). For multi-cluster networking: service meshes (Istio, Linkerd) handle traffic, retries, circuit breaking. Most teams use Calico or Cilium; the choice affects observability and policy enforcement depth.
βΆPod scheduling, node affinity, taints, and tolerations β when do I use each?
NodeSelector: simple key=value labels, fast to set up. Affinity: complex logic (AND/OR, preferred/required), better for spreading or collocation. Taints: nodes reject pods unless the pod has a matching toleration (useful for dedicating nodes to workloads like GPUs or spot instances). Most common: NodeAffinity for "run on nodes with SSD" and Taints+Tolerations for "this workload only runs on expensive GPU nodes". Defaults work for 80% of cases; customize when you hit resource constraints.
βΆHow do I manage secrets securely in Kubernetes?
Never hardcode secrets in configs or images. Kubernetes Secrets store sensitive data (base64 encoded, NOT encrypted by default). Always encrypt secrets at rest in etcd (--encryption-provider flag). For production: use an external secret manager (HashiCorp Vault, AWS Secrets Manager, Google Secret Manager) with a controller syncing to K8s Secrets. Workloads read from Secret objects mounted as volumes or env vars. Ensure RBAC restricts who can read secrets; prefer admission webhooks to prevent plaintext secret leaks in logs.
βΆPersistent volumes, storage classes, and claims β how does K8s storage work?
PersistentVolume (PV): the actual storage (allocated by ops). PersistentVolumeClaim (PVC): a request by an app ("I need 10GB"). StorageClass: a provisioner that creates PVs automatically (e.g., EBS for AWS, GCP persistent disks). Workflow: app claims storage via PVC β K8s matches to PV or creates one via StorageClass β volume is mounted into the pod. Lifecycle: volumes survive pod deletion (PVC lifetime), but deleting PVC can delete the storage (depends on reclaim policy). For databases: use StatefulSets + persistent volumes. For stateless apps: don't use PVs.
βΆHow do I debug failing pods and containers in Kubernetes?
kubectl logs <pod>: see application output. kubectl describe pod <pod>: see events (why didn't it start?). kubectl exec <pod> -- /bin/bash: get a shell inside. kubectl port-forward <pod> 8080:8080: forward traffic to debug. Check Resource limits (OOMKilled?), Readiness probes (is the app actually ready?), and Node status (kubectl get nodes). For network issues: try a debug pod (kubectl run -i --tty --image=busybox debug -- sh) to test DNS/connectivity. Always check events first β they explain why pods are pending or crashing.