Prompt Engineering Patterns for Developers

You can get Claude to write a poem in the playground. But building a production system that reliably classifies support tickets, generates structured JSON, or orchestrates tool calls? That requires engineering, not just prompting.

This guide covers 8 patterns that separate playground experiments from production AI systems. Every pattern includes real Python code using the Anthropic SDK that you can copy and adapt.

Pattern 1: Chain-of-Thought Reasoning

Large language models produce better answers when forced to reason step by step before giving a final answer. Without this, the model jumps to conclusions - especially on multi-step problems.

Without Chain-of-Thought

# Prompt: "Is this SQL injection vulnerable? SELECT * FROM users WHERE id = " + user_input
# Model response: "Yes, it's vulnerable."
# Problem: No explanation, no confidence, no actionable detail

With Chain-of-Thought

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this code for SQL injection vulnerabilities.
Think through each step:
1. Identify where user input enters the query
2. Check if the input is parameterized or sanitized
3. Determine the attack vector if vulnerable
4. Provide the fix

Code:
query = "SELECT * FROM users WHERE id = " + request.args.get("id")
cursor.execute(query)"""
    }]
)

The structured reasoning prompt forces the model to examine the code methodically rather than pattern-matching to a quick answer. This dramatically improves accuracy on security analysis, debugging, and code review tasks.

Pattern 2: Few-Shot with Structured Examples

Few-shot prompting provides input-output examples that teach the model your exact format and decision criteria. This is the most reliable way to get consistent output without fine-tuning.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    system="""You classify customer support tickets into categories.
Respond with ONLY the category name, nothing else.""",
    messages=[{
        "role": "user",
        "content": """Examples:

Input: "I can't log into my account, password reset isn't working"
Category: authentication

Input: "Your monthly plan is too expensive, I want a refund"
Category: billing

Input: "The export to PDF button gives a 500 error"
Category: bug-report

Input: "Can you add dark mode to the dashboard?"
Category: feature-request

Input: "I was charged twice for my subscription this month"
Category: billing

Now classify this ticket:
Input: "The app crashes every time I try to upload a file larger than 10MB"
Category:"""
    }]
)

Five examples are usually sufficient. Include edge cases and examples near decision boundaries (a billing complaint vs. a bug report about billing). The model learns your classification logic from the pattern, not from a lengthy explanation.

Pattern 3: System Prompt Architecture

The system prompt is your most powerful tool for controlling model behavior. A well-structured system prompt defines the role, constraints, output format, and guardrails in a single place.

SYSTEM_PROMPT = """You are a senior code reviewer for a Python/Django codebase.

ROLE:
- Review code changes for bugs, security issues, and performance problems
- You are direct and specific. No pleasantries.

CONSTRAINTS:
- Only comment on actual issues, not style preferences
- Never suggest adding comments to code
- Focus on: security vulnerabilities, logic errors, N+1 queries, race conditions
- Ignore: naming conventions, import ordering, type hints

OUTPUT FORMAT:
For each issue found, respond in this exact format:
FILE: [filename]
LINE: [line number or range]
SEVERITY: [critical | warning | info]
ISSUE: [one-sentence description]
FIX: [concrete code fix or suggestion]

If no issues found, respond with: LGTM - no issues found.

EXAMPLES:
FILE: views.py
LINE: 45-48
SEVERITY: critical
ISSUE: Raw SQL query with string interpolation allows SQL injection
FIX: Use parameterized query: cursor.execute("SELECT * FROM users WHERE id = %s", [user_id])"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": diff_content}]
)

Pattern 4: Structured Output with Tool Use

Asking a model to “respond in JSON” works sometimes. For production systems, use tool use (function calling) to guarantee structured output. The model must conform to your schema.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{
        "name": "classify_ticket",
        "description": "Classify a support ticket and extract metadata",
        "input_schema": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "enum": ["bug-report", "feature-request",
                             "billing", "authentication", "general"]
                },
                "priority": {
                    "type": "string",
                    "enum": ["critical", "high", "medium", "low"]
                },
                "summary": {
                    "type": "string",
                    "description": "One-sentence summary of the issue"
                },
                "affected_feature": {
                    "type": "string",
                    "description": "The product feature affected"
                }
            },
            "required": ["category", "priority", "summary"]
        }
    }],
    tool_choice={"type": "tool", "name": "classify_ticket"},
    messages=[{
        "role": "user",
        "content": "Ticket: The checkout page freezes after I add a promo code. "
                   "Happens on Chrome and Firefox. Started yesterday."
    }]
)

# Extract the structured result
tool_result = response.content[0].input
# {"category": "bug-report", "priority": "high",
#  "summary": "Checkout page freezes when applying promo code",
#  "affected_feature": "checkout"}

By setting tool_choice to force a specific tool, you guarantee the response matches your schema. No regex parsing, no JSON extraction, no “sometimes it adds a preamble” headaches.

Pattern 5: Guardrails and Validation Pipeline

Production AI needs input validation, output validation, and fallback strategies. Never trust raw model output without verification.

import json
from dataclasses import dataclass

@dataclass
class AIResponse:
    content: str
    is_valid: bool
    error: str | None = None

def validate_input(user_input: str) -> str | None:
    """Return error message if input is invalid, None if OK."""
    if len(user_input) > 10000:
        return "Input too long (max 10,000 characters)"
    if len(user_input.strip()) == 0:
        return "Input cannot be empty"
    return None

def validate_output(response_text: str, expected_format: str) -> bool:
    """Validate model output matches expected format."""
    if expected_format == "json":
        try:
            json.loads(response_text)
            return True
        except json.JSONDecodeError:
            return False
    return True

def ai_pipeline(user_input: str) -> AIResponse:
    # Step 1: Validate input
    input_error = validate_input(user_input)
    if input_error:
        return AIResponse(content="", is_valid=False, error=input_error)

    # Step 2: Call the model
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": user_input}]
        )
        content = response.content[0].text
    except anthropic.APIError as e:
        return AIResponse(content="", is_valid=False, error=str(e))

    # Step 3: Validate output
    if not validate_output(content, "json"):
        # Retry once with explicit format instruction
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system="You MUST respond with valid JSON only. No text before or after.",
            messages=[{"role": "user", "content": user_input}]
        )
        content = response.content[0].text

        if not validate_output(content, "json"):
            return AIResponse(content="", is_valid=False,
                            error="Failed to get valid JSON after retry")

    return AIResponse(content=content, is_valid=True)

Pattern 6: Retrieval-Augmented Generation (RAG)

Instead of hoping the model knows about your internal APIs, retrieve relevant context and inject it into the prompt. This eliminates hallucination for domain-specific questions.

def answer_with_context(question: str, docs: list[str]) -> str:
    """Answer a question using retrieved documentation."""
    context = "\n\n---\n\n".join(docs)

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="""Answer the question using ONLY the provided documentation.
If the documentation does not contain the answer, say
"I don't have enough information to answer this."
Never make up information not present in the docs.
Cite the relevant section when answering.""",
        messages=[{
            "role": "user",
            "content": f"""Documentation:
{context}

Question: {question}"""
        }]
    )
    return response.content[0].text

# Usage with your vector store:
query = "How do I configure rate limiting?"
relevant_docs = vector_store.search(query, top_k=5)
answer = answer_with_context(query, relevant_docs)

The key instruction is “ONLY the provided documentation” with an explicit fallback (“I don’t have enough information”). Without this, the model will happily hallucinate plausible-sounding but incorrect API configurations.

Pattern 7: Conversation Context Management

Long conversations eat tokens fast. At 200K context, a 50-turn conversation with code snippets can cost dollars per message. Manage context aggressively.

def manage_conversation(messages: list, max_tokens: int = 50000) -> list:
    """Keep conversation within token budget using sliding window."""
    # Always keep: system context + first message + last N messages
    if estimate_tokens(messages) <= max_tokens:
        return messages

    # Keep first message (establishes context) and trim middle
    first = messages[:1]
    recent = messages[-10:]  # Keep last 10 messages

    # Summarize the trimmed middle section
    middle = messages[1:-10]
    summary = summarize_messages(middle)

    return first + [{"role": "user",
                     "content": f"[Previous conversation summary: {summary}]"},
                    {"role": "assistant",
                     "content": "Understood, I have the context."}] + recent

Pattern 8: Tool Use and Function Calling

Give the model tools (functions it can call) to interact with external systems. This transforms a text generator into an agent that can query databases, call APIs, and take actions.

tools = [
    {
        "name": "get_order_status",
        "description": "Look up the current status of a customer order by order ID",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "The order ID (e.g., ORD-12345)"
                }
            },
            "required": ["order_id"]
        }
    },
    {
        "name": "create_support_ticket",
        "description": "Create a new support ticket in the system",
        "input_schema": {
            "type": "object",
            "properties": {
                "subject": {"type": "string"},
                "description": {"type": "string"},
                "priority": {"type": "string", "enum": ["low", "medium", "high"]}
            },
            "required": ["subject", "description", "priority"]
        }
    }
]

# The agentic loop: call model, execute tools, feed results back
messages = [{"role": "user", "content": "What's the status of order ORD-78901?"}]

while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )

    # Check if the model wants to use a tool
    if response.stop_reason == "tool_use":
        tool_block = next(b for b in response.content if b.type == "tool_use")

        # Execute the actual function
        result = execute_tool(tool_block.name, tool_block.input)

        # Feed the result back to the model
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_block.id,
                "content": json.dumps(result)
            }]
        })
    else:
        # Model gave a final text response
        print(response.content[0].text)
        break

Anti-Patterns to Avoid

No output validation: Never trust raw model output for structured data. Always validate against a schema.
Prompt injection blindness: If user input goes into your prompt, malicious users can override your instructions. Always separate system instructions from user content using the system parameter.
Unbounded context: Stuffing entire codebases into the prompt. Retrieve only relevant sections using embeddings or keyword search.
Hardcoded prompts: Prompts evolve. Store them in versioned configuration, not inline strings. A/B test prompt changes like you test code changes.
Ignoring caching: The Anthropic API supports prompt caching. If your system prompt is 5,000 tokens and identical across requests, enable caching to cut costs by 90% on cache hits.

Decision Matrix

Use Case	Primary Pattern	Secondary Pattern
Text classification	Few-shot examples	Structured output
Code review	System prompt architecture	Chain-of-thought
Customer support bot	Tool use	RAG + Guardrails
Data extraction	Structured output (tool use)	Few-shot
Q&A over documentation	RAG	Context management
Complex reasoning tasks	Chain-of-thought	Guardrails
Multi-step workflows	Tool use (agentic loop)	Context management

Key Takeaways

Chain-of-thought improves accuracy on any multi-step reasoning task - always use it for complex analysis
Few-shot examples are the most reliable way to get consistent formatting without fine-tuning
Tool use guarantees structured output - never parse free-text JSON in production
RAG eliminates hallucination for domain-specific questions - always retrieve before generating
Guardrails are not optional - validate inputs and outputs in every production pipeline
Manage conversation context aggressively - summarize old messages to stay within token budgets
Version your prompts like you version code - they are as critical as any other system configuration

Prompt engineering is software engineering. Treat prompts as code: version them, test them, validate their output, and iterate based on production data. The patterns in this guide are not theoretical - they are the same techniques used in production AI systems handling millions of requests per day.

Prompt Engineering Patterns for Developers: Beyond Basic Prompts

Pattern 1: Chain-of-Thought Reasoning

Without Chain-of-Thought

With Chain-of-Thought

Pattern 2: Few-Shot with Structured Examples

Pattern 3: System Prompt Architecture

Pattern 4: Structured Output with Tool Use

Pattern 5: Guardrails and Validation Pipeline

Pattern 6: Retrieval-Augmented Generation (RAG)

Pattern 7: Conversation Context Management

Pattern 8: Tool Use and Function Calling

Anti-Patterns to Avoid

Decision Matrix

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Pattern 1: Chain-of-Thought Reasoning

Without Chain-of-Thought

With Chain-of-Thought

Pattern 2: Few-Shot with Structured Examples

Pattern 3: System Prompt Architecture

Pattern 4: Structured Output with Tool Use

Pattern 5: Guardrails and Validation Pipeline

Pattern 6: Retrieval-Augmented Generation (RAG)

Pattern 7: Conversation Context Management

Pattern 8: Tool Use and Function Calling

Anti-Patterns to Avoid

Decision Matrix

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

MCP Security in Production: How to Safely Run AI Agents with Tools, OAuth, and Gateways

Monorepo vs Polyrepo: How to Structure Your Codebase at Scale