Python has three concurrency models, and most developers use the wrong one. They reach for threading when they need multiprocessing, or multiprocessing when they need asyncio. The choice depends on one question: is your bottleneck I/O or CPU?
The GIL: What It Actually Means
The Global Interpreter Lock (GIL) prevents multiple Python threads from executing Python bytecode simultaneously. But it does not prevent concurrency — it prevents parallelism for CPU-bound code.
# The GIL means:
# - Only one thread executes Python code at a time
# - BUT threads release the GIL during I/O operations
# - So I/O-bound threads CAN run concurrently
# CPU-bound: GIL is a bottleneck (use multiprocessing)
# I/O-bound: GIL does not matter (use threading or asyncio)
Threading: For I/O-Bound Work
When your code waits for network responses, file reads, or database queries, threads release the GIL and other threads can run. Threading is the simplest way to parallelize I/O.
import concurrent.futures
import requests
import time
urls = [f"https://httpbin.org/delay/1" for _ in range(10)]
# Sequential: 10 seconds (1 second per request)
def fetch_sequential():
return [requests.get(url).status_code for url in urls]
# Threaded: ~1 second (all requests in parallel)
def fetch_threaded():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(requests.get, url) for url in urls]
return [f.result().status_code for f in futures]
# Benchmark
start = time.time()
fetch_sequential()
print(f"Sequential: {time.time() - start:.1f}s") # ~10.0s
start = time.time()
fetch_threaded()
print(f"Threaded: {time.time() - start:.1f}s") # ~1.1s
ThreadPoolExecutor Patterns
# Map pattern: apply function to each item
def download_file(url):
response = requests.get(url)
filename = url.split("/")[-1]
with open(filename, "wb") as f:
f.write(response.content)
return filename
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# map() returns results in order
results = list(executor.map(download_file, urls))
# As-completed pattern: process results as they finish
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(download_file, url): url for url in urls}
for future in concurrent.futures.as_completed(futures):
url = futures[future]
try:
result = future.result()
print(f"Downloaded: {result}")
except Exception as e:
print(f"Failed {url}: {e}")
Asyncio: For Many Concurrent Connections
Asyncio uses a single thread with an event loop. It is more efficient than threading when you have thousands of concurrent connections because there is no thread switching overhead.
import asyncio
import aiohttp
async def fetch_one(session, url):
async with session.get(url) as response:
return await response.text()
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_one(session, url) for url in urls]
return await asyncio.gather(*tasks)
# Run 1000 concurrent requests with ONE thread
urls = [f"https://httpbin.org/delay/1" for _ in range(1000)]
results = asyncio.run(fetch_all(urls))
# Completes in ~2 seconds (not 1000 seconds!)
Asyncio Patterns
# Semaphore: limit concurrency
async def fetch_with_limit(urls, max_concurrent=50):
semaphore = asyncio.Semaphore(max_concurrent)
async def bounded_fetch(session, url):
async with semaphore:
async with session.get(url) as response:
return await response.text()
async with aiohttp.ClientSession() as session:
tasks = [bounded_fetch(session, url) for url in urls]
return await asyncio.gather(*tasks)
# Timeout per task
async def fetch_with_timeout(session, url, timeout=5):
try:
async with asyncio.timeout(timeout):
async with session.get(url) as response:
return await response.text()
except asyncio.TimeoutError:
return None
# Producer-consumer with asyncio.Queue
async def producer(queue):
for i in range(100):
await queue.put(i)
await queue.put(None) # Sentinel
async def consumer(queue, name):
while True:
item = await queue.get()
if item is None:
queue.put_nowait(None) # Pass sentinel to next consumer
break
await process(item)
queue.task_done()
async def main():
queue = asyncio.Queue(maxsize=10)
await asyncio.gather(
producer(queue),
consumer(queue, "worker-1"),
consumer(queue, "worker-2"),
consumer(queue, "worker-3"),
)
Multiprocessing: For CPU-Bound Work
Each process gets its own Python interpreter and its own GIL. This is the only way to achieve true parallelism for CPU-bound Python code.
import concurrent.futures
import math
def is_prime(n):
"""CPU-intensive prime check."""
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
numbers = [112272535095293, 112582705942171, 115280095190773,
115797848077099, 1099726899285419, 115280095190773] * 4
# Sequential: uses one CPU core
start = time.time()
results = [is_prime(n) for n in numbers]
print(f"Sequential: {time.time() - start:.1f}s") # ~8.0s
# Multiprocessing: uses ALL CPU cores
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = list(executor.map(is_prime, numbers))
print(f"Multiprocessing: {time.time() - start:.1f}s") # ~2.0s (4 cores)
# Threading: same as sequential (GIL prevents parallel CPU work)
start = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(executor.map(is_prime, numbers))
print(f"Threading: {time.time() - start:.1f}s") # ~8.0s (GIL!)
Comparison Table
| Feature | Threading | Asyncio | Multiprocessing |
|---|---|---|---|
| Best for | I/O-bound (moderate) | I/O-bound (many connections) | CPU-bound |
| GIL impact | Released during I/O | Single-threaded (no GIL issue) | Separate GIL per process |
| Memory overhead | ~8MB per thread | ~1KB per coroutine | Full process per worker |
| Max concurrency | ~100-1000 threads | ~10,000+ coroutines | ~CPU core count |
| Shared state | Easy (same memory) | Easy (same thread) | Hard (serialization needed) |
| Libraries | All (requests, etc.) | Async only (aiohttp, etc.) | All (separate processes) |
| Debugging | Race conditions possible | Simpler (no real concurrency bugs) | Hard (inter-process communication) |
Decision Framework
- Making 10-100 HTTP requests? → Threading (simple, works with requests library)
- Making 1,000+ concurrent connections? → Asyncio (one thread handles thousands)
- Processing images, crunching numbers, ML training? → Multiprocessing (true parallelism)
- Web scraping at scale? → Asyncio + aiohttp (high concurrency, low memory)
- Django/Flask background tasks? → Celery with multiprocessing workers
- Data pipeline with I/O and CPU stages? → Asyncio for I/O, multiprocessing for CPU
Key Takeaways
- The GIL prevents parallel CPU work, not concurrent I/O — threading works fine for I/O
- Threading is simplest for moderate I/O concurrency — no async/await refactoring needed
- Asyncio scales to thousands of connections on one thread — use it for high-concurrency I/O
- Multiprocessing is the ONLY option for parallel CPU work in standard CPython
- concurrent.futures provides a unified API for both threading and multiprocessing
- Do not use threading for CPU work — it will be as slow as sequential due to the GIL
- Asyncio requires async libraries — you cannot use requests, only aiohttp or httpx
Python concurrency is not confusing once you answer one question: is my bottleneck I/O or CPU? I/O-bound gets threading or asyncio. CPU-bound gets multiprocessing. Everything else is implementation detail. Match the tool to the bottleneck and your Python code will be as concurrent as any language.