Python is wonderful for productivity, but sometimes you hit a wall — a tight loop that needs to run 100x faster, a C library you need to wrap, or a data structure that doesn't exist in pure Python. That's when C extensions come in. This workshop takes you from "never written a C extension" to "shipping a production-quality module" — step by step, with code you can run at each stage.

Python ↔ C Extension Boundary
Python Code (your_script.py)import fastutils; fastutils.fibonacci(70)
CPython InterpreterConverts Python objects to C types via PyArg_ParseTuple
Your C Extension (fastutils.c)Pure C computation — no Python overhead, 100x faster
Result returned to PythonC types converted back via Py_BuildValue / PyLong_FromLong
Workshop Roadmap
👋HelloStep 1-2
SpeedStep 3: Fibonacci
📝StringsStep 4-5
📦TypesStep 7: IntArray
🧠MemoryGolden Rules
🚀ShipProduction

What You'll Build

By the end of this workshop, you'll have built fastutils — a C extension module with:

  • A fast Fibonacci function (100x faster than pure Python)
  • A fast string reversal function
  • A custom IntArray type with C-level performance
  • Proper error handling, memory management, and documentation
  • A setup.py that builds and installs the module

Prerequisites

# You need: Python 3.8+, a C compiler, Python dev headers
# Ubuntu/Debian:
sudo apt install python3-dev build-essential

# macOS (Xcode command line tools):
xcode-select --install

# Verify:
python3 -c "import sysconfig; print(sysconfig.get_path('include'))"
# Should print something like: /usr/include/python3.12

Step 1 — The Minimal C Extension

Let's start with the absolute simplest C extension — a module with one function:

// fastutils.c — Step 1: minimal module
#include <Python.h>

// The C function: takes Python args, returns a Python object
static PyObject* fastutils_hello(PyObject* self, PyObject* args) {
    const char* name;

    // Parse the Python argument: "s" = string
    if (!PyArg_ParseTuple(args, "s", &name))
        return NULL;  // Exception already set by PyArg_ParseTuple

    // Build a Python string and return it
    return PyUnicode_FromFormat("Hello, %s! From C.", name);
}

// Method table: maps Python function names to C functions
static PyMethodDef fastutils_methods[] = {
    {
        "hello",                    // Python function name
        fastutils_hello,            // C function pointer
        METH_VARARGS,               // Calling convention
        "hello(name) -> str\n\n"  // Docstring
        "Returns a greeting from C."
    },
    {NULL, NULL, 0, NULL}  // Sentinel — marks end of array
};

// Module definition
static struct PyModuleDef fastutils_module = {
    PyModuleDef_HEAD_INIT,
    "fastutils",                           // Module name
    "High-performance utility functions",  // Module docstring
    -1,                                    // Per-interpreter state size (-1 = global)
    fastutils_methods                      // Method table
};

// Module initialization function — MUST be named PyInit_
PyMODINIT_FUNC PyInit_fastutils(void) {
    return PyModule_Create(&fastutils_module);
}

Step 2 — Build and Test

# setup.py
from setuptools import setup, Extension

setup(
    name="fastutils",
    version="0.1.0",
    ext_modules=[
        Extension("fastutils", sources=["fastutils.c"]),
    ],
)
# Build the extension in-place
python setup.py build_ext --inplace

# Test it
python -c "import fastutils; print(fastutils.hello('World'))"
# Output: Hello, World! From C.

Congratulations — you've just built your first C extension. Let's make it useful.

Step 3 — Fast Fibonacci with C Types

Now let's add a function that actually demonstrates speed. The key: we do the heavy computation in C, only converting to/from Python objects at the boundary.

// Add to fastutils.c

static PyObject* fastutils_fibonacci(PyObject* self, PyObject* args) {
    int n;

    // "i" = int
    if (!PyArg_ParseTuple(args, "i", &n))
        return NULL;

    // Input validation — raise ValueError for bad input
    if (n < 0) {
        PyErr_SetString(PyExc_ValueError, "n must be non-negative");
        return NULL;
    }

    // Pure C computation — no Python overhead
    unsigned long long a = 0, b = 1;
    for (int i = 0; i < n; i++) {
        unsigned long long temp = b;
        b = a + b;
        a = temp;

        // Check for overflow
        if (a > b) {
            PyErr_SetString(PyExc_OverflowError,
                "Fibonacci number too large for unsigned long long");
            return NULL;
        }
    }

    // Convert C result back to Python int
    return PyLong_FromUnsignedLongLong(a);
}

Add it to the method table:

static PyMethodDef fastutils_methods[] = {
    {"hello", fastutils_hello, METH_VARARGS,
     "hello(name) -> str\n\nReturns a greeting from C."},
    {"fibonacci", fastutils_fibonacci, METH_VARARGS,
     "fibonacci(n) -> int\n\n"
     "Returns the nth Fibonacci number. Computed in C for maximum speed."},
    {NULL, NULL, 0, NULL}
};

Let's benchmark it:

import time
import fastutils

# Pure Python fibonacci
def fib_python(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

# Benchmark
n = 70

start = time.perf_counter()
for _ in range(1_000_000):
    fib_python(n)
python_time = time.perf_counter() - start

start = time.perf_counter()
for _ in range(1_000_000):
    fastutils.fibonacci(n)
c_time = time.perf_counter() - start

print(f"Python: {python_time:.3f}s")
print(f"C ext:  {c_time:.3f}s")
print(f"Speedup: {python_time / c_time:.1f}x")
# Typical output: ~80-120x faster

Step 4 — Working with Strings

Strings require careful handling in C extensions because Python strings are Unicode objects, not simple char arrays:

static PyObject* fastutils_reverse(PyObject* self, PyObject* args) {
    const char* input;
    Py_ssize_t length;

    // "s#" = string + length (handles embedded nulls)
    if (!PyArg_ParseTuple(args, "s#", &input, &length))
        return NULL;

    // Allocate buffer for reversed string
    char* reversed = (char*)malloc(length + 1);
    if (!reversed) {
        PyErr_NoMemory();
        return NULL;
    }

    // Reverse in C
    for (Py_ssize_t i = 0; i < length; i++) {
        reversed[i] = input[length - 1 - i];
    }
    reversed[length] = '\0';

    // Create Python string from C string
    PyObject* result = PyUnicode_FromStringAndSize(reversed, length);

    // ALWAYS free allocated memory
    free(reversed);

    return result;  // Can be NULL if PyUnicode_FromStringAndSize failed
}

Step 5 — Working with Lists

Processing Python lists from C gives you direct access to the underlying array:

// Sum all numbers in a list — 10-20x faster than Python's sum() for large lists
static PyObject* fastutils_fast_sum(PyObject* self, PyObject* args) {
    PyObject* list_obj;

    if (!PyArg_ParseTuple(args, "O", &list_obj))
        return NULL;

    // Type check — ensure it's actually a list
    if (!PyList_Check(list_obj)) {
        PyErr_SetString(PyExc_TypeError, "argument must be a list");
        return NULL;
    }

    Py_ssize_t size = PyList_GET_SIZE(list_obj);
    double total = 0.0;

    for (Py_ssize_t i = 0; i < size; i++) {
        PyObject* item = PyList_GET_ITEM(list_obj, i);  // Borrowed reference

        // Convert to C double
        double value = PyFloat_AsDouble(item);
        if (value == -1.0 && PyErr_Occurred()) {
            return NULL;  // Item wasn't a number
        }
        total += value;
    }

    return PyFloat_FromDouble(total);
}

Step 6 — Keyword Arguments

Real-world functions need keyword arguments. Use METH_VARARGS | METH_KEYWORDS:

static PyObject* fastutils_repeat(PyObject* self, PyObject* args, PyObject* kwargs) {
    const char* text;
    int count = 2;           // Default value
    const char* separator = "";  // Default value

    static char* kwlist[] = {"text", "count", "separator", NULL};

    // "s|is" = required string, optional int, optional string
    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|is", kwlist,
                                      &text, &count, &separator))
        return NULL;

    if (count < 0) {
        PyErr_SetString(PyExc_ValueError, "count must be non-negative");
        return NULL;
    }

    // Build the result
    PyObject* parts = PyList_New(count);
    if (!parts) return NULL;

    for (int i = 0; i < count; i++) {
        PyObject* s = PyUnicode_FromString(text);
        if (!s) {
            Py_DECREF(parts);
            return NULL;
        }
        PyList_SET_ITEM(parts, i, s);  // Steals reference
    }

    PyObject* sep = PyUnicode_FromString(separator);
    if (!sep) {
        Py_DECREF(parts);
        return NULL;
    }

    PyObject* result = PyUnicode_Join(sep, parts);
    Py_DECREF(sep);
    Py_DECREF(parts);

    return result;
}

// In the method table, use METH_VARARGS | METH_KEYWORDS:
{"repeat", (PyCFunction)fastutils_repeat, METH_VARARGS | METH_KEYWORDS,
 "repeat(text, count=2, separator='') -> str\n\n"
 "Repeats text count times, joined by separator."},
# Usage from Python:
fastutils.repeat("ha", count=3, separator="-")
# Returns: "ha-ha-ha"

Step 7 — Custom Types (Classes in C)

This is the most powerful feature — defining a new Python type entirely in C. Let's build an IntArray that stores integers in a contiguous C array:

// IntArray type — a fast, fixed-size integer array

typedef struct {
    PyObject_HEAD          // Required Python object header
    long* data;            // C array of longs
    Py_ssize_t length;     // Array length
} IntArrayObject;

// Destructor — called when the object is garbage collected
static void IntArray_dealloc(IntArrayObject* self) {
    free(self->data);
    Py_TYPE(self)->tp_free((PyObject*)self);
}

// Constructor — __init__
static int IntArray_init(IntArrayObject* self, PyObject* args, PyObject* kwargs) {
    PyObject* iterable;
    if (!PyArg_ParseTuple(args, "O", &iterable))
        return -1;

    PyObject* iterator = PyObject_GetIter(iterable);
    if (!iterator) return -1;

    // First pass: count elements
    Py_ssize_t count = 0;
    PyObject* item;
    PyObject* items = PySequence_List(iterable);
    if (!items) {
        Py_DECREF(iterator);
        return -1;
    }
    count = PyList_GET_SIZE(items);

    // Allocate C array
    self->data = (long*)malloc(count * sizeof(long));
    if (!self->data) {
        Py_DECREF(items);
        PyErr_NoMemory();
        return -1;
    }
    self->length = count;

    // Copy data
    for (Py_ssize_t i = 0; i < count; i++) {
        item = PyList_GET_ITEM(items, i);
        self->data[i] = PyLong_AsLong(item);
        if (self->data[i] == -1 && PyErr_Occurred()) {
            Py_DECREF(items);
            free(self->data);
            self->data = NULL;
            return -1;
        }
    }

    Py_DECREF(items);
    return 0;
}

// __len__
static Py_ssize_t IntArray_length(IntArrayObject* self) {
    return self->length;
}

// __getitem__
static PyObject* IntArray_getitem(IntArrayObject* self, Py_ssize_t index) {
    if (index < 0 || index >= self->length) {
        PyErr_SetString(PyExc_IndexError, "index out of range");
        return NULL;
    }
    return PyLong_FromLong(self->data[index]);
}

// sum() method — pure C loop over the array
static PyObject* IntArray_sum(IntArrayObject* self, PyObject* Py_UNUSED(args)) {
    long long total = 0;
    for (Py_ssize_t i = 0; i < self->length; i++) {
        total += self->data[i];
    }
    return PyLong_FromLongLong(total);
}

// __repr__
static PyObject* IntArray_repr(IntArrayObject* self) {
    if (self->length == 0)
        return PyUnicode_FromString("IntArray([])");

    PyObject* parts = PyList_New(self->length);
    for (Py_ssize_t i = 0; i < self->length; i++) {
        PyList_SET_ITEM(parts, i, PyUnicode_FromFormat("%ld", self->data[i]));
    }
    PyObject* comma = PyUnicode_FromString(", ");
    PyObject* joined = PyUnicode_Join(comma, parts);
    PyObject* result = PyUnicode_FromFormat("IntArray([%U])", joined);
    Py_DECREF(parts);
    Py_DECREF(comma);
    Py_DECREF(joined);
    return result;
}
# Usage from Python:
arr = fastutils.IntArray([10, 20, 30, 40, 50])
print(len(arr))       # 5
print(arr[2])         # 30
print(arr.sum())      # 150
print(repr(arr))      # IntArray([10, 20, 30, 40, 50])
CPython Reference Counting — The 5 Golden Rules
Rule 1: Py_INCREF when you keep a reference
Borrowed references don't own the object — INCREF to claim ownership
Rule 2: Return values transfer ownership
Don't DECREF objects you return — the caller owns them now
Rule 3: DECREF everything you create
If you called Py*_New/From*, you must DECREF (unless returned)
Rule 4: Check NULL after every API call
NULL means an exception occurred — clean up and return NULL
Rule 5: Use Py_XDECREF in cleanup paths
Safe with NULL pointers — simplifies error handling

Memory Management — The Golden Rules

Memory management is where most C extension bugs live. Python uses reference counting — every object has a count of how many references point to it. When the count hits zero, the object is freed.

// Rule 1: Py_INCREF when you keep a reference
PyObject* obj = PyList_GetItem(list, 0);  // Borrowed reference
Py_INCREF(obj);  // Now you own a reference
// ... use obj ...
Py_DECREF(obj);  // Release when done

// Rule 2: Return values transfer ownership
return PyLong_FromLong(42);  // Caller owns the reference — don't DECREF

// Rule 3: Py_DECREF everything you create (unless you return it)
PyObject* temp = PyUnicode_FromString("hello");
// ... use temp ...
Py_DECREF(temp);  // YOU created it, YOU must free it

// Rule 4: Check for NULL after every Python API call
PyObject* result = PyObject_CallFunction(func, "i", 42);
if (result == NULL) {
    // An exception occurred — clean up and return NULL
    Py_XDECREF(other_obj);  // Py_XDECREF is safe with NULL
    return NULL;
}

// Rule 5: Use Py_XDECREF for pointers that might be NULL
Py_XDECREF(maybe_null_ptr);  // Safe — does nothing if NULL

Error Handling Best Practices

// Pattern 1: Validate input early, fail fast
static PyObject* my_func(PyObject* self, PyObject* args) {
    int n;
    if (!PyArg_ParseTuple(args, "i", &n))
        return NULL;

    if (n <= 0) {
        PyErr_SetString(PyExc_ValueError, "n must be positive");
        return NULL;
    }

    // ... proceed with valid input ...
}

// Pattern 2: Cleanup on error with goto
static PyObject* complex_func(PyObject* self, PyObject* args) {
    PyObject* result = NULL;
    PyObject* list = NULL;
    char* buffer = NULL;

    list = PyList_New(100);
    if (!list) goto error;

    buffer = (char*)malloc(1024);
    if (!buffer) {
        PyErr_NoMemory();
        goto error;
    }

    // ... do work ...

    result = PyUnicode_FromString(buffer);
    // Fall through to cleanup

error:
    free(buffer);
    Py_XDECREF(list);
    return result;  // NULL on error, valid object on success
}

// Pattern 3: Raise custom exceptions
static PyObject* MyError;  // Module-level exception

// In PyInit_fastutils:
MyError = PyErr_NewException("fastutils.FastError", NULL, NULL);
Py_XINCREF(MyError);
PyModule_AddObject(module, "FastError", MyError);

// Usage:
PyErr_SetString(MyError, "something went wrong in C");
return NULL;

PyArg_ParseTuple Format Strings

These are your bread and butter for parsing Python arguments in C:

Format   C Type                Python Type
──────   ──────────────────    ──────────────────
"i"      int                   int
"l"      long                  int
"L"      long long             int
"n"      Py_ssize_t            int
"f"      float                 float
"d"      double                float
"s"      const char*           str (UTF-8 encoded)
"s#"     const char*, Py_ssize_t  str + length
"O"      PyObject*             any object
"O!"     PyObject* (type-checked)  specific type
"|"      —                     marks start of optional args
"$"      —                     marks keyword-only args

// Examples:
PyArg_ParseTuple(args, "si", &name, &count)       // str + int
PyArg_ParseTuple(args, "s|id", &s, &n, &f)        // str, optional int + double
PyArg_ParseTuple(args, "O!", &PyList_Type, &list)  // must be a list

Complete setup.py for Production

from setuptools import setup, Extension
import sys

# Compiler flags for performance and safety
extra_compile_args = ["-O3", "-Wall", "-Wextra"]
if sys.platform != "win32":
    extra_compile_args.append("-std=c11")

setup(
    name="fastutils",
    version="1.0.0",
    description="High-performance utility functions written in C",
    author="Your Name",
    ext_modules=[
        Extension(
            "fastutils",
            sources=["fastutils.c"],
            extra_compile_args=extra_compile_args,
        ),
    ],
    python_requires=">=3.8",
)
# Development workflow:
python setup.py build_ext --inplace  # Build for development
pip install -e .                     # Install in editable mode
python -m pytest tests/              # Run tests

# Distribution:
pip install build
python -m build                      # Creates wheel + sdist
pip install twine
twine upload dist/*                  # Publish to PyPI

Debugging C Extensions

# Compile with debug symbols
python setup.py build_ext --inplace --debug

# Run under gdb
gdb -ex run --args python -c "import fastutils; fastutils.fibonacci(10)"

# Use Valgrind for memory leak detection
valgrind --leak-check=full python -c "
import fastutils
for i in range(10000):
    fastutils.fibonacci(i % 90)
"

# Enable Python's debug allocator
PYTHONMALLOC=debug python -c "import fastutils; ..."

Best Practices Checklist

  • Always check return values: Every Py* function can return NULL. Check it. Every time.
  • Never mix malloc/free with Python allocators: Use malloc/free for C data, PyMem_Malloc/PyMem_Free for Python-tracked memory.
  • Release the GIL for long C operations: Use Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS around pure C code so other threads can run.
  • Validate all input at the boundary: Type-check, range-check, and null-check everything that comes from Python before doing C work.
  • Write docstrings for every function: Use the \n\n convention in your method table strings — help() will format them correctly.
  • Test with pytest like any other module: Your C extension is a Python module — test it with normal Python test tools.
  • Use Py_XDECREF in cleanup paths: It's safe with NULL pointers, making error cleanup much simpler.
  • Compile with -Wall -Wextra: Let the compiler catch bugs before your users do.
  • Profile before extending: Only write C extensions for proven bottlenecks. Profile first, optimize second.

C extensions are the ultimate escape hatch when Python isn't fast enough. They're used by every major Python library — NumPy, pandas, Pillow, cryptography, uvloop — and now you know how to build them yourself. Start small, respect the reference counting rules, and you'll be writing production-grade C extensions in no time.