Most developers know a dozen Linux commands and Google the rest. That works until production is on fire at 2 AM and you need to find which process is holding a file lock, which socket is stuck in CLOSE_WAIT, or which log entry appeared right before the crash. These commands are your firefighting toolkit.

Text Processing: awk, sed, cut

awk: Column-Based Processing

# Print the 5th column of a space-delimited file
awk '{print $5}' access.log

# Sum all values in column 3
awk '{sum += $3} END {print sum}' data.txt

# Filter rows where column 9 (HTTP status) is 500
awk '$9 == 500' access.log

# Print lines where response time (col 11) > 1000ms
awk '$11 > 1000 {print $7, $11"ms"}' access.log

# Count requests per HTTP method
awk '{count[$6]++} END {for (m in count) print m, count[m]}' access.log
# "GET 45123
# "POST 12456
# "PUT 3421

# Count unique IPs
awk '{print $1}' access.log | sort -u | wc -l

sed: Stream Editing

# Replace text in-place
sed -i 's/old_api_url/new_api_url/g' config.yaml

# Delete lines matching a pattern
sed '/DEBUG/d' app.log > clean.log

# Print only lines 100-200
sed -n '100,200p' large_file.log

# Insert text after a matching line
sed '/\[database\]/a connection_timeout = 30' config.ini

# Remove blank lines
sed '/^$/d' file.txt

cut: Extract Columns

# Extract fields from CSV
cut -d',' -f1,3 users.csv      # Fields 1 and 3, comma-delimited

# Extract from colon-delimited (like /etc/passwd)
cut -d':' -f1,7 /etc/passwd    # Username and shell

# Extract characters 1-10
cut -c1-10 file.txt

JSON Processing: jq

# Pretty print JSON
curl -s https://api.example.com/data | jq .

# Extract a field
echo '{"name":"Alice","age":30}' | jq '.name'
# "Alice"

# Extract from arrays
echo '[{"id":1,"name":"A"},{"id":2,"name":"B"}]' | jq '.[0].name'
# "A"

# Filter array elements
echo '[{"status":"active"},{"status":"inactive"}]' | \
  jq '[.[] | select(.status == "active")]'

# Extract multiple fields into CSV
jq -r '.[] | [.id, .name, .email] | @csv' users.json

# Transform JSON structure
jq '{user_count: length, names: [.[].name]}' users.json

# Parse Docker inspect output
docker inspect mycontainer | jq '.[0].NetworkSettings.IPAddress'

# Parse kubectl output
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'

Parallel Execution: xargs

# Delete all .pyc files in parallel
find . -name "*.pyc" | xargs rm

# Run commands in parallel (-P for parallelism)
find . -name "*.test.js" | xargs -P 4 -I {} node {}
# Runs 4 test files simultaneously

# Bulk rename files
ls *.jpg | xargs -I {} mv {} archive/{}

# Curl multiple URLs in parallel
cat urls.txt | xargs -P 10 -I {} curl -s -o /dev/null -w "%{url}: %{http_code}\n" {}

# Kill all processes matching a pattern
pgrep -f "celery worker" | xargs kill -TERM

# Batch database operations
cat user_ids.txt | xargs -I {} psql -c "DELETE FROM sessions WHERE user_id = '{}';"

Process Investigation: ps, htop, strace

# Find what is eating CPU
ps aux --sort=-%cpu | head -10

# Find what is eating memory
ps aux --sort=-%mem | head -10

# Watch processes in real-time (better than top)
htop
# Press F6 to sort, F4 to filter, F5 for tree view

# Trace system calls of a running process
strace -p PID -e trace=network    # Network calls only
strace -p PID -e trace=file       # File operations only
strace -p PID -c                  # Summary of all syscalls

# Example: why is this process slow?
strace -p 12345 -T -e trace=read,write
# Shows every read/write call with time spent
# If you see: read(5, ..., 4096) = 4096  <2.003456>
# That file descriptor is blocking for 2 seconds!

Network Debugging: ss, curl, dig

# List all listening ports (replaces netstat)
ss -tlnp
# -t: TCP, -l: listening, -n: numeric ports, -p: show process
# State    Local Address:Port   Process
# LISTEN   0.0.0.0:8080         users:(("node",pid=1234))
# LISTEN   0.0.0.0:5432         users:(("postgres",pid=5678))

# Find what is using port 8080
ss -tlnp | grep 8080

# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn
#   245 ESTAB
#    12 TIME-WAIT
#     3 CLOSE-WAIT    # These are trouble! (leaked connections)

# DNS debugging
dig example.com              # Full DNS query
dig +short example.com       # Just the IP
dig @8.8.8.8 example.com    # Query a specific DNS server

# HTTP debugging with curl
curl -v https://api.example.com/health     # Verbose (shows headers, TLS)
curl -w "\nTime: %{time_total}s\nHTTP: %{http_code}\n" -s -o /dev/null URL
curl --resolve example.com:443:1.2.3.4 https://example.com  # Override DNS

File Investigation: lsof, find, du

# Who has this file open?
lsof /var/log/app.log

# What files does this process have open?
lsof -p 12345

# Find processes with deleted files still open (disk space leak!)
lsof +L1
# If you deleted a large log file but disk space did not free up,
# a process still has it open. Restart the process.

# Find large files
find . -type f -size +100M -exec ls -lh {} \;

# Find recently modified files
find . -type f -mmin -30      # Modified in last 30 minutes
find . -type f -mtime -1      # Modified in last 24 hours

# Disk usage by directory (sorted)
du -sh */ | sort -rh | head -10
# 4.2G   node_modules/
# 1.8G   .git/
# 250M   dist/

# Watch disk usage in real-time
watch -n 5 'df -h | grep /dev/sda'

Log Analysis

# Follow a log file in real-time
tail -f /var/log/app.log

# Follow multiple files
tail -f /var/log/app.log /var/log/error.log

# Search compressed log archives
zgrep "ERROR" /var/log/app.log.*.gz

# Count errors per hour
grep "ERROR" app.log | awk '{print $1, substr($2,1,2)":00"}' | sort | uniq -c
#   15 2026-04-28 10:00
#   42 2026-04-28 11:00    # Spike!
#    8 2026-04-28 12:00

# Find the most common error messages
grep "ERROR" app.log | awk -F'ERROR' '{print $2}' | sort | uniq -c | sort -rn | head -10

# Extract requests slower than 1 second
grep "request_time" access.log | awk -F'request_time=' '{print $2}' | \
  awk '$1 > 1.0 {print}' | sort -rn | head -20

Quick Reference

Task Command
Find what uses a port ss -tlnp | grep :8080
Find large files find . -size +100M
Count log errors per hour grep ERROR log | cut -d' ' -f1-2 | uniq -c
Watch a process strace -p PID -c
Parse JSON jq '.field' file.json
Parallel execution xargs -P 4 -I {} cmd {}
Disk space leak lsof +L1
Replace text in files sed -i 's/old/new/g' file

Key Takeaways

  • jq is essential — every API, Docker, and Kubernetes tool outputs JSON. jq makes it queryable.
  • awk handles 90% of log analysis — column extraction, filtering, counting, summing
  • ss replaces netstat — faster and shows more information about socket states
  • strace reveals why a process is slow — it shows every system call with timing
  • lsof +L1 finds disk space leaks — deleted files held open by running processes
  • xargs -P enables easy parallelism — run commands across multiple inputs simultaneously
  • Combine commands with pipes — the power is in composition, not individual tools

These commands are not arcane knowledge — they are the standard toolkit for anyone who operates production systems. Spend an afternoon practicing them and you will debug faster than colleagues who reach for monitoring dashboards first. The command line is the fastest path from “something is wrong” to “here is exactly what happened.”