If you've ever waited 5-10 minutes for Kubernetes Cluster Autoscaler to spin up new nodes while your pods sat in Pending state, you know the pain. Karpenter is AWS's open-source node provisioner that replaces Cluster Autoscaler with something dramatically faster and smarter. It provisions the right nodes in under 60 seconds, handles spot interruptions automatically, and can cut your compute costs by 40-60%.
What is Karpenter?
Karpenter is an open-source, high-performance Kubernetes node lifecycle manager. Unlike Cluster Autoscaler (which works with pre-defined node groups), Karpenter directly provisions compute capacity from the cloud provider based on the actual requirements of your pending pods.
How Karpenter Works
Installation
Karpenter runs as a Helm chart inside your EKS cluster. Here's the production-ready setup:
# Prerequisites:
# - EKS cluster (1.25+)
# - IAM roles for service accounts (IRSA) configured
# - aws CLI, kubectl, helm installed
# Set your cluster variables
export CLUSTER_NAME="my-production-cluster"
export AWS_REGION="us-east-1"
export KARPENTER_VERSION="1.1.0"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
# Create the IAM roles for Karpenter
# (Karpenter needs permission to create/terminate EC2 instances)
aws cloudformation deploy \
--stack-name "Karpenter-${CLUSTER_NAME}" \
--template-file karpenter-cloudformation.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${CLUSTER_NAME}"
# Install Karpenter via Helm
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "${KARPENTER_VERSION}" \
--namespace kube-system \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
# Verify Karpenter is running
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
# NAME READY STATUS RESTARTS AGE
# karpenter-5f4b8c8d9f-xxxxx 1/1 Running 0 2m
Core Concepts
NodePool: Define What to Provision
A NodePool tells Karpenter what kind of nodes it can create. Think of it as a set of constraints and preferences:
# nodepool.yaml — Production-ready NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
# Template for nodes created by this pool
template:
metadata:
labels:
environment: production
team: platform
spec:
# Which EC2NodeClass to use for AWS-specific config
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
# Instance type requirements
requirements:
# Architecture: amd64 or arm64 (Graviton)
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
# Capacity type: spot first, on-demand as fallback
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Instance categories: general purpose + compute optimized
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
# Instance sizes: medium to 8xlarge
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium", "large", "xlarge", "2xlarge", "4xlarge", "8xlarge"]
# Availability zones
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
# Taints (optional — restrict what can run on these nodes)
# taints:
# - key: workload-type
# value: compute-heavy
# effect: NoSchedule
# Resource limits — cap total provisioned capacity
limits:
cpu: "1000" # Max 1000 vCPUs across all nodes
memory: "2000Gi" # Max 2TB RAM
# Disruption policy — how Karpenter consolidates/replaces nodes
disruption:
# Consolidation: merge underutilized nodes to save money
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
# Budget: max nodes that can be disrupted simultaneously
budgets:
- nodes: "10%" # Disrupt at most 10% of nodes at once
# How long before an idle node is terminated
weight: 10 # Priority (higher = preferred over other NodePools)
EC2NodeClass: Define How to Provision
# ec2nodeclass.yaml — AWS-specific configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
# IAM role for the nodes
role: "KarpenterNodeRole-my-production-cluster"
# AMI selection — use the latest EKS-optimized AMI
amiSelectorTerms:
- alias: al2023@latest # Amazon Linux 2023 (recommended)
# Subnet discovery — find subnets by tag
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "my-production-cluster"
# Security group discovery
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "my-production-cluster"
# Block device mappings (root volume)
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
# Tags applied to all EC2 instances
tags:
Environment: production
ManagedBy: karpenter
Team: platform
# User data (optional — bootstrap scripts)
# userData: |
# #!/bin/bash
# echo "Custom bootstrap logic here"
# Metadata options (IMDSv2 required for security)
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required # Enforce IMDSv2
Spot Instance Optimization
Karpenter's spot handling is one of its killer features. It automatically diversifies across instance types and handles interruptions gracefully:
# Spot-optimized NodePool — maximize savings
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-compute
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
# SPOT ONLY for this pool
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# Wide instance diversity = fewer interruptions
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r", "c5", "m5", "r5", "c6i", "m6i", "r6i"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["large", "xlarge", "2xlarge", "4xlarge"]
# Use Graviton (arm64) for 20% better price-performance
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
# Disruption: enable consolidation for further savings
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
---
# On-demand fallback pool (lower priority)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand-fallback
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c"]
weight: 1 # Lower priority than spot pool (weight: 10)
Consolidation: Automatic Cost Optimization
Karpenter continuously watches for underutilized nodes and consolidates workloads to fewer, better-fitting instances. This happens automatically — no cron jobs, no manual intervention.
# Example: Consolidation in action
# Before consolidation:
# Node 1 (m5.2xlarge — 8 vCPU, 32GB): using 2 vCPU, 4GB (25% utilized)
# Node 2 (m5.2xlarge — 8 vCPU, 32GB): using 3 vCPU, 8GB (37% utilized)
# Total cost: 2x m5.2xlarge = ~$0.384/hr * 2 = $0.768/hr
# After consolidation (Karpenter automatically):
# 1. Launches m5.xlarge (4 vCPU, 16GB) — fits both workloads
# 2. Cordons Node 1 and Node 2
# 3. Drains pods (respecting PDBs)
# 4. Terminates old nodes
# Result: 1x m5.xlarge = $0.192/hr (75% savings!)
# Monitor consolidation events
kubectl get events --field-selector reason=DisruptionInitiated -n kube-system
GPU Workloads
Karpenter can provision GPU instances for ML/AI workloads just as easily:
# GPU NodePool for ML training
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu-training
spec:
template:
metadata:
labels:
workload-type: gpu
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: gpu-class
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["g", "p"] # GPU instance families
- key: karpenter.k8s.aws/instance-gpu-count
operator: Gt
values: ["0"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
limits:
cpu: "200"
memory: "800Gi"
nvidia.com/gpu: "16" # Max 16 GPUs across all nodes
---
# Pod requesting a GPU
apiVersion: v1
kind: Pod
metadata:
name: ml-training-job
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
containers:
- name: trainer
image: my-registry/ml-trainer:latest
resources:
requests:
nvidia.com/gpu: 1
memory: 16Gi
cpu: 4
limits:
nvidia.com/gpu: 1
memory: 32Gi
# Karpenter sees this pod Pending, provisions a g5.xlarge (1 GPU),
# and the pod starts in under 90 seconds
Multi-Architecture (ARM64 / Graviton)
AWS Graviton processors offer 20% better price-performance than x86. Karpenter makes it easy to use both:
# NodePool allowing both architectures
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # Karpenter picks the cheapest
# Your pods need multi-arch images:
# docker buildx build --platform linux/amd64,linux/arm64 -t my-app:latest .
# Karpenter's decision process:
# 1. Pod requests 2 vCPU, 4GB memory
# 2. Karpenter evaluates: m6i.large (amd64) = $0.096/hr
# m6g.large (arm64) = $0.077/hr
# 3. Picks m6g.large (arm64) — 20% cheaper, same performance
# 4. Only if your image doesn't support arm64, falls back to amd64
Monitoring & Observability
# Karpenter exposes Prometheus metrics out of the box
# Key metrics to monitor:
# karpenter_nodes_total — Current node count by pool
# karpenter_nodeclaims_terminated — Node terminations (consolidation, expiry)
# karpenter_pods_startup_duration — Time from Pending to Running
# karpenter_provisioner_scheduling — Scheduling decisions per second
# karpenter_interruption_received — Spot interruption events
# Grafana dashboard (community):
# https://github.com/aws/karpenter/tree/main/charts/karpenter/dashboards
# Useful kubectl commands:
# See all NodeClaims (Karpenter-managed nodes)
kubectl get nodeclaims
# NAME TYPE ZONE NODE READY AGE
# default-abc123 m6g.xlarge us-east-1a ip-10-0-1-45 True 2h
# spot-xyz789 c6g.2xlarge us-east-1b ip-10-0-2-78 True 45m
# See NodePool status (capacity used vs limits)
kubectl get nodepool
# NAME NODECLASS NODES READY AGE
# default default 12 12 30d
# spot default 8 8 30d
# Describe a NodeClaim for details
kubectl describe nodeclaim default-abc123
# Shows: instance type, zone, capacity type, allocatable resources, pods running
# Check for disruption events
kubectl get events -A --field-selector reason=DisruptionInitiated
# Logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f
Production Best Practices
Migrating from Cluster Autoscaler
# Migration strategy — run both side-by-side, then decommission CA
# Step 1: Install Karpenter alongside Cluster Autoscaler
# (They can coexist — Karpenter handles new provisioning,
# CA manages existing node groups)
# Step 2: Create NodePool + EC2NodeClass
kubectl apply -f nodepool.yaml
kubectl apply -f ec2nodeclass.yaml
# Step 3: Taint existing CA-managed node groups
# This prevents new pods from scheduling on CA nodes
kubectl taint nodes -l eks.amazonaws.com/nodegroup=old-ng \
legacy=cluster-autoscaler:PreferNoSchedule
# Step 4: Gradually drain CA node groups
# Karpenter will provision replacement capacity automatically
kubectl cordon
kubectl drain --ignore-daemonsets --delete-emptydir-data
# Step 5: Once all workloads are on Karpenter nodes:
# - Scale CA node groups to 0
# - Uninstall Cluster Autoscaler
# - Delete old ASGs/node groups
# Step 6: Celebrate your 40-60% cost reduction 🎉
Cost Comparison
Karpenter is the most impactful cost optimization tool in the Kubernetes ecosystem. It's faster than Cluster Autoscaler, smarter about instance selection, and aggressively consolidates underutilized capacity. If you're running EKS in production, migrating to Karpenter is one of the highest-ROI infrastructure changes you can make.