Containers are not VMs. They share the host kernel and rely on Linux primitives — namespaces, cgroups, seccomp, capabilities — for isolation. Misconfigure any of these, and the container boundary dissolves.
Linux Container Isolation Primitives
- Namespaces: Isolate what a container can see (PID, network, mount, user, IPC, UTS)
- cgroups: Limit what a container can use (CPU, memory, I/O)
- Seccomp: Filter which syscalls a container can make (block dangerous ones like ptrace, mount)
- Capabilities: Fine-grained root privileges (drop everything except what is needed)
- AppArmor/SELinux: Mandatory access control for file and network operations
Secure Container Image Patterns
# Insecure: full OS, root user, unnecessary tools
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl wget vim
COPY app /app
CMD ["/app"]
# Secure: distroless, non-root, minimal attack surface
FROM gcr.io/distroless/static-debian12:nonroot
COPY --chown=65534:65534 app /app
USER 65534
ENTRYPOINT ["/app"]
# No shell, no package manager, no curl, no wget
# Attacker with RCE cannot spawn a shell
Pod Security Standards
Kubernetes defines three Pod Security Standards that enforce container hardening at the namespace level:
- Privileged: No restrictions (for system workloads only)
- Baseline: Prevents known privilege escalations (blocks hostPID, hostNetwork, privileged)
- Restricted: Maximum hardening (requires non-root, read-only root FS, drops ALL capabilities)
# Enforce restricted Pod Security Standard on a namespace
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted