Python is wonderful for productivity, but sometimes you hit a wall — a tight loop that needs to run 100x faster, a C library you need to wrap, or a data structure that doesn't exist in pure Python. That's when C extensions come in. This workshop takes you from "never written a C extension" to "shipping a production-quality module" — step by step, with code you can run at each stage.
What You'll Build
By the end of this workshop, you'll have built fastutils — a C extension module with:
- A fast Fibonacci function (100x faster than pure Python)
- A fast string reversal function
- A custom
IntArraytype with C-level performance - Proper error handling, memory management, and documentation
- A
setup.pythat builds and installs the module
Prerequisites
# You need: Python 3.8+, a C compiler, Python dev headers
# Ubuntu/Debian:
sudo apt install python3-dev build-essential
# macOS (Xcode command line tools):
xcode-select --install
# Verify:
python3 -c "import sysconfig; print(sysconfig.get_path('include'))"
# Should print something like: /usr/include/python3.12
Step 1 — The Minimal C Extension
Let's start with the absolute simplest C extension — a module with one function:
// fastutils.c — Step 1: minimal module
#include <Python.h>
// The C function: takes Python args, returns a Python object
static PyObject* fastutils_hello(PyObject* self, PyObject* args) {
const char* name;
// Parse the Python argument: "s" = string
if (!PyArg_ParseTuple(args, "s", &name))
return NULL; // Exception already set by PyArg_ParseTuple
// Build a Python string and return it
return PyUnicode_FromFormat("Hello, %s! From C.", name);
}
// Method table: maps Python function names to C functions
static PyMethodDef fastutils_methods[] = {
{
"hello", // Python function name
fastutils_hello, // C function pointer
METH_VARARGS, // Calling convention
"hello(name) -> str\n\n" // Docstring
"Returns a greeting from C."
},
{NULL, NULL, 0, NULL} // Sentinel — marks end of array
};
// Module definition
static struct PyModuleDef fastutils_module = {
PyModuleDef_HEAD_INIT,
"fastutils", // Module name
"High-performance utility functions", // Module docstring
-1, // Per-interpreter state size (-1 = global)
fastutils_methods // Method table
};
// Module initialization function — MUST be named PyInit_
PyMODINIT_FUNC PyInit_fastutils(void) {
return PyModule_Create(&fastutils_module);
}
Step 2 — Build and Test
# setup.py
from setuptools import setup, Extension
setup(
name="fastutils",
version="0.1.0",
ext_modules=[
Extension("fastutils", sources=["fastutils.c"]),
],
)
# Build the extension in-place
python setup.py build_ext --inplace
# Test it
python -c "import fastutils; print(fastutils.hello('World'))"
# Output: Hello, World! From C.
Congratulations — you've just built your first C extension. Let's make it useful.
Step 3 — Fast Fibonacci with C Types
Now let's add a function that actually demonstrates speed. The key: we do the heavy computation in C, only converting to/from Python objects at the boundary.
// Add to fastutils.c
static PyObject* fastutils_fibonacci(PyObject* self, PyObject* args) {
int n;
// "i" = int
if (!PyArg_ParseTuple(args, "i", &n))
return NULL;
// Input validation — raise ValueError for bad input
if (n < 0) {
PyErr_SetString(PyExc_ValueError, "n must be non-negative");
return NULL;
}
// Pure C computation — no Python overhead
unsigned long long a = 0, b = 1;
for (int i = 0; i < n; i++) {
unsigned long long temp = b;
b = a + b;
a = temp;
// Check for overflow
if (a > b) {
PyErr_SetString(PyExc_OverflowError,
"Fibonacci number too large for unsigned long long");
return NULL;
}
}
// Convert C result back to Python int
return PyLong_FromUnsignedLongLong(a);
}
Add it to the method table:
static PyMethodDef fastutils_methods[] = {
{"hello", fastutils_hello, METH_VARARGS,
"hello(name) -> str\n\nReturns a greeting from C."},
{"fibonacci", fastutils_fibonacci, METH_VARARGS,
"fibonacci(n) -> int\n\n"
"Returns the nth Fibonacci number. Computed in C for maximum speed."},
{NULL, NULL, 0, NULL}
};
Let's benchmark it:
import time
import fastutils
# Pure Python fibonacci
def fib_python(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
# Benchmark
n = 70
start = time.perf_counter()
for _ in range(1_000_000):
fib_python(n)
python_time = time.perf_counter() - start
start = time.perf_counter()
for _ in range(1_000_000):
fastutils.fibonacci(n)
c_time = time.perf_counter() - start
print(f"Python: {python_time:.3f}s")
print(f"C ext: {c_time:.3f}s")
print(f"Speedup: {python_time / c_time:.1f}x")
# Typical output: ~80-120x faster
Step 4 — Working with Strings
Strings require careful handling in C extensions because Python strings are Unicode objects, not simple char arrays:
static PyObject* fastutils_reverse(PyObject* self, PyObject* args) {
const char* input;
Py_ssize_t length;
// "s#" = string + length (handles embedded nulls)
if (!PyArg_ParseTuple(args, "s#", &input, &length))
return NULL;
// Allocate buffer for reversed string
char* reversed = (char*)malloc(length + 1);
if (!reversed) {
PyErr_NoMemory();
return NULL;
}
// Reverse in C
for (Py_ssize_t i = 0; i < length; i++) {
reversed[i] = input[length - 1 - i];
}
reversed[length] = '\0';
// Create Python string from C string
PyObject* result = PyUnicode_FromStringAndSize(reversed, length);
// ALWAYS free allocated memory
free(reversed);
return result; // Can be NULL if PyUnicode_FromStringAndSize failed
}
Step 5 — Working with Lists
Processing Python lists from C gives you direct access to the underlying array:
// Sum all numbers in a list — 10-20x faster than Python's sum() for large lists
static PyObject* fastutils_fast_sum(PyObject* self, PyObject* args) {
PyObject* list_obj;
if (!PyArg_ParseTuple(args, "O", &list_obj))
return NULL;
// Type check — ensure it's actually a list
if (!PyList_Check(list_obj)) {
PyErr_SetString(PyExc_TypeError, "argument must be a list");
return NULL;
}
Py_ssize_t size = PyList_GET_SIZE(list_obj);
double total = 0.0;
for (Py_ssize_t i = 0; i < size; i++) {
PyObject* item = PyList_GET_ITEM(list_obj, i); // Borrowed reference
// Convert to C double
double value = PyFloat_AsDouble(item);
if (value == -1.0 && PyErr_Occurred()) {
return NULL; // Item wasn't a number
}
total += value;
}
return PyFloat_FromDouble(total);
}
Step 6 — Keyword Arguments
Real-world functions need keyword arguments. Use METH_VARARGS | METH_KEYWORDS:
static PyObject* fastutils_repeat(PyObject* self, PyObject* args, PyObject* kwargs) {
const char* text;
int count = 2; // Default value
const char* separator = ""; // Default value
static char* kwlist[] = {"text", "count", "separator", NULL};
// "s|is" = required string, optional int, optional string
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|is", kwlist,
&text, &count, &separator))
return NULL;
if (count < 0) {
PyErr_SetString(PyExc_ValueError, "count must be non-negative");
return NULL;
}
// Build the result
PyObject* parts = PyList_New(count);
if (!parts) return NULL;
for (int i = 0; i < count; i++) {
PyObject* s = PyUnicode_FromString(text);
if (!s) {
Py_DECREF(parts);
return NULL;
}
PyList_SET_ITEM(parts, i, s); // Steals reference
}
PyObject* sep = PyUnicode_FromString(separator);
if (!sep) {
Py_DECREF(parts);
return NULL;
}
PyObject* result = PyUnicode_Join(sep, parts);
Py_DECREF(sep);
Py_DECREF(parts);
return result;
}
// In the method table, use METH_VARARGS | METH_KEYWORDS:
{"repeat", (PyCFunction)fastutils_repeat, METH_VARARGS | METH_KEYWORDS,
"repeat(text, count=2, separator='') -> str\n\n"
"Repeats text count times, joined by separator."},
# Usage from Python:
fastutils.repeat("ha", count=3, separator="-")
# Returns: "ha-ha-ha"
Step 7 — Custom Types (Classes in C)
This is the most powerful feature — defining a new Python type entirely in C. Let's build an IntArray that stores integers in a contiguous C array:
// IntArray type — a fast, fixed-size integer array
typedef struct {
PyObject_HEAD // Required Python object header
long* data; // C array of longs
Py_ssize_t length; // Array length
} IntArrayObject;
// Destructor — called when the object is garbage collected
static void IntArray_dealloc(IntArrayObject* self) {
free(self->data);
Py_TYPE(self)->tp_free((PyObject*)self);
}
// Constructor — __init__
static int IntArray_init(IntArrayObject* self, PyObject* args, PyObject* kwargs) {
PyObject* iterable;
if (!PyArg_ParseTuple(args, "O", &iterable))
return -1;
PyObject* iterator = PyObject_GetIter(iterable);
if (!iterator) return -1;
// First pass: count elements
Py_ssize_t count = 0;
PyObject* item;
PyObject* items = PySequence_List(iterable);
if (!items) {
Py_DECREF(iterator);
return -1;
}
count = PyList_GET_SIZE(items);
// Allocate C array
self->data = (long*)malloc(count * sizeof(long));
if (!self->data) {
Py_DECREF(items);
PyErr_NoMemory();
return -1;
}
self->length = count;
// Copy data
for (Py_ssize_t i = 0; i < count; i++) {
item = PyList_GET_ITEM(items, i);
self->data[i] = PyLong_AsLong(item);
if (self->data[i] == -1 && PyErr_Occurred()) {
Py_DECREF(items);
free(self->data);
self->data = NULL;
return -1;
}
}
Py_DECREF(items);
return 0;
}
// __len__
static Py_ssize_t IntArray_length(IntArrayObject* self) {
return self->length;
}
// __getitem__
static PyObject* IntArray_getitem(IntArrayObject* self, Py_ssize_t index) {
if (index < 0 || index >= self->length) {
PyErr_SetString(PyExc_IndexError, "index out of range");
return NULL;
}
return PyLong_FromLong(self->data[index]);
}
// sum() method — pure C loop over the array
static PyObject* IntArray_sum(IntArrayObject* self, PyObject* Py_UNUSED(args)) {
long long total = 0;
for (Py_ssize_t i = 0; i < self->length; i++) {
total += self->data[i];
}
return PyLong_FromLongLong(total);
}
// __repr__
static PyObject* IntArray_repr(IntArrayObject* self) {
if (self->length == 0)
return PyUnicode_FromString("IntArray([])");
PyObject* parts = PyList_New(self->length);
for (Py_ssize_t i = 0; i < self->length; i++) {
PyList_SET_ITEM(parts, i, PyUnicode_FromFormat("%ld", self->data[i]));
}
PyObject* comma = PyUnicode_FromString(", ");
PyObject* joined = PyUnicode_Join(comma, parts);
PyObject* result = PyUnicode_FromFormat("IntArray([%U])", joined);
Py_DECREF(parts);
Py_DECREF(comma);
Py_DECREF(joined);
return result;
}
# Usage from Python:
arr = fastutils.IntArray([10, 20, 30, 40, 50])
print(len(arr)) # 5
print(arr[2]) # 30
print(arr.sum()) # 150
print(repr(arr)) # IntArray([10, 20, 30, 40, 50])
Memory Management — The Golden Rules
Memory management is where most C extension bugs live. Python uses reference counting — every object has a count of how many references point to it. When the count hits zero, the object is freed.
// Rule 1: Py_INCREF when you keep a reference
PyObject* obj = PyList_GetItem(list, 0); // Borrowed reference
Py_INCREF(obj); // Now you own a reference
// ... use obj ...
Py_DECREF(obj); // Release when done
// Rule 2: Return values transfer ownership
return PyLong_FromLong(42); // Caller owns the reference — don't DECREF
// Rule 3: Py_DECREF everything you create (unless you return it)
PyObject* temp = PyUnicode_FromString("hello");
// ... use temp ...
Py_DECREF(temp); // YOU created it, YOU must free it
// Rule 4: Check for NULL after every Python API call
PyObject* result = PyObject_CallFunction(func, "i", 42);
if (result == NULL) {
// An exception occurred — clean up and return NULL
Py_XDECREF(other_obj); // Py_XDECREF is safe with NULL
return NULL;
}
// Rule 5: Use Py_XDECREF for pointers that might be NULL
Py_XDECREF(maybe_null_ptr); // Safe — does nothing if NULL
Error Handling Best Practices
// Pattern 1: Validate input early, fail fast
static PyObject* my_func(PyObject* self, PyObject* args) {
int n;
if (!PyArg_ParseTuple(args, "i", &n))
return NULL;
if (n <= 0) {
PyErr_SetString(PyExc_ValueError, "n must be positive");
return NULL;
}
// ... proceed with valid input ...
}
// Pattern 2: Cleanup on error with goto
static PyObject* complex_func(PyObject* self, PyObject* args) {
PyObject* result = NULL;
PyObject* list = NULL;
char* buffer = NULL;
list = PyList_New(100);
if (!list) goto error;
buffer = (char*)malloc(1024);
if (!buffer) {
PyErr_NoMemory();
goto error;
}
// ... do work ...
result = PyUnicode_FromString(buffer);
// Fall through to cleanup
error:
free(buffer);
Py_XDECREF(list);
return result; // NULL on error, valid object on success
}
// Pattern 3: Raise custom exceptions
static PyObject* MyError; // Module-level exception
// In PyInit_fastutils:
MyError = PyErr_NewException("fastutils.FastError", NULL, NULL);
Py_XINCREF(MyError);
PyModule_AddObject(module, "FastError", MyError);
// Usage:
PyErr_SetString(MyError, "something went wrong in C");
return NULL;
PyArg_ParseTuple Format Strings
These are your bread and butter for parsing Python arguments in C:
Format C Type Python Type
────── ────────────────── ──────────────────
"i" int int
"l" long int
"L" long long int
"n" Py_ssize_t int
"f" float float
"d" double float
"s" const char* str (UTF-8 encoded)
"s#" const char*, Py_ssize_t str + length
"O" PyObject* any object
"O!" PyObject* (type-checked) specific type
"|" — marks start of optional args
"$" — marks keyword-only args
// Examples:
PyArg_ParseTuple(args, "si", &name, &count) // str + int
PyArg_ParseTuple(args, "s|id", &s, &n, &f) // str, optional int + double
PyArg_ParseTuple(args, "O!", &PyList_Type, &list) // must be a list
Complete setup.py for Production
from setuptools import setup, Extension
import sys
# Compiler flags for performance and safety
extra_compile_args = ["-O3", "-Wall", "-Wextra"]
if sys.platform != "win32":
extra_compile_args.append("-std=c11")
setup(
name="fastutils",
version="1.0.0",
description="High-performance utility functions written in C",
author="Your Name",
ext_modules=[
Extension(
"fastutils",
sources=["fastutils.c"],
extra_compile_args=extra_compile_args,
),
],
python_requires=">=3.8",
)
# Development workflow:
python setup.py build_ext --inplace # Build for development
pip install -e . # Install in editable mode
python -m pytest tests/ # Run tests
# Distribution:
pip install build
python -m build # Creates wheel + sdist
pip install twine
twine upload dist/* # Publish to PyPI
Debugging C Extensions
# Compile with debug symbols
python setup.py build_ext --inplace --debug
# Run under gdb
gdb -ex run --args python -c "import fastutils; fastutils.fibonacci(10)"
# Use Valgrind for memory leak detection
valgrind --leak-check=full python -c "
import fastutils
for i in range(10000):
fastutils.fibonacci(i % 90)
"
# Enable Python's debug allocator
PYTHONMALLOC=debug python -c "import fastutils; ..."
Best Practices Checklist
- Always check return values: Every
Py*function can return NULL. Check it. Every time. - Never mix
malloc/freewith Python allocators: Usemalloc/freefor C data,PyMem_Malloc/PyMem_Freefor Python-tracked memory. - Release the GIL for long C operations: Use
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSaround pure C code so other threads can run. - Validate all input at the boundary: Type-check, range-check, and null-check everything that comes from Python before doing C work.
- Write docstrings for every function: Use the
\n\nconvention in your method table strings —help()will format them correctly. - Test with
pytestlike any other module: Your C extension is a Python module — test it with normal Python test tools. - Use
Py_XDECREFin cleanup paths: It's safe with NULL pointers, making error cleanup much simpler. - Compile with
-Wall -Wextra: Let the compiler catch bugs before your users do. - Profile before extending: Only write C extensions for proven bottlenecks. Profile first, optimize second.
C extensions are the ultimate escape hatch when Python isn't fast enough. They're used by every major Python library — NumPy, pandas, Pillow, cryptography, uvloop — and now you know how to build them yourself. Start small, respect the reference counting rules, and you'll be writing production-grade C extensions in no time.