Get Reliable JSON from AI

A practical guide to structured output from LLMs. Learn how to get predictable, type-safe JSON from AI models using Pydantic, Zod, and provider-specific approaches.

Why Structured Output Matters

LLMs generate probabilistic text, but your application needs predictable, structured data. This guide covers practical approaches to get reliable JSON from AI models--transforming unpredictable outputs into type-safe data your application can trust.

Structured output is essential for building production AI systems that integrate seamlessly with your existing codebase. Whether you're extracting data for LLM evaluation and testing pipelines or powering AI-powered search functionality, consistent JSON output forms the foundation of reliable AI integrations.

What you'll learn:

  • JSON mode across OpenAI, Anthropic Claude, and Google Gemini
  • Pydantic integration with the Instructor library
  • Zod validation for TypeScript environments
  • Production patterns and error handling strategies

Agenta AI provides comprehensive coverage of structured output benefits across different providers.

4

Major LLM providers with JSON mode

90%+

Success rate with proper validation

50%

Reduction in post-processing code

JSON Mode Across LLM Providers

Modern LLM providers offer native JSON mode for constrained generation. Understanding each provider's approach helps you choose the right solution for your stack.

OpenAI's Structured Outputs

OpenAI provides JSON mode through the response_format parameter. Set { "type": "json_object" } to guarantee valid JSON responses, with system prompts guiding the specific structure. This approach works well for simple schemas where the model understands your intent through prompt engineering rather than strict schema validation.

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
 model="gpt-4-turbo-preview",
 response_format={"type": "json_object"},
 messages=[
 {"role": "system", "content": "You are a helpful assistant that outputs JSON. Extract user information including name, email, and preferences."},
 {"role": "user", "content": "John Smith is a 35-year-old software engineer who prefers dark mode and email notifications."}
 ]
)

Anthropic Claude's Prefilling Strategy

Claude excels with response prefilling--starting the conversation with partial JSON that the model completes. This technique forces consistent output formats without special API parameters. By providing the opening brace and key structure, you guide the model to complete rather than generate, significantly improving format consistency.

# Prefill Claude's response to force JSON structure
response = client.messages.create(
 model="claude-3-opus-20240229",
 messages=[
 {"role": "user", "content": "{\"product_name\":\""}
 ],
 system="Always respond in the exact format provided, completing the JSON structure."
)
# The model must now complete the JSON structure rather than choosing its own format

Google Gemini's Type Integration

Gemini integrates with Python's TypedDict for schema definition, making it particularly powerful for Python developers using type annotations. The response_mime_type and response_schema parameters work together to enforce JSON output conforming to your specified types.

import google.generativeai as genai
from typing_extensions import TypedDict

class ProductDetails(TypedDict):
 name: str
 price: float
 category: str
 in_stock: bool

model = genai.GenerativeModel("gemini-1.5-pro-latest")
result = model.generate_content(
 "Extract product details from the catalog.",
 generation_config=genai.GenerationConfig(
 response_mime_type="application/json",
 response_schema=ProductDetails
 )
)

LiteLLM's Unified Interface

LiteLLM provides consistent JSON mode across providers, enabling provider-agnostic code that works regardless of which model you use. The library handles provider-specific implementation details, allowing developers to write single code paths that work across OpenAI, Anthropic, Google, and other providers. This abstraction proves valuable when building applications that might switch between models or use multiple providers simultaneously.

Pydantic Integration for Type-Safe Validation

Pydantic has become the standard for structured output validation in Python, offering runtime type checking with intuitive declarative schemas. Its declarative model definitions make schema design intuitive while providing robust validation capabilities.

Proper schema validation not only ensures data quality but also contributes to AI cost optimization by reducing retries and token consumption. When validation fails, immediate feedback prevents wasted processing downstream.

The Instructor Library

The Instructor library transforms structured output implementation by combining Pydantic models with LLM API calls. It handles the complete pipeline--schema definition, model invocation, and validation--with automatic retries on failure. Rather than separately defining schemas, calling models, and parsing responses, Instructor handles the entire pipeline with a single response model parameter.

F22 Labs demonstrates how Instructor's Pydantic integration provides advantages over OpenAI's native structured output, particularly for complex nested schemas.

import instructor
from openai import OpenAI
from pydantic import BaseModel

# Define your schema with Pydantic
class UserInfo(BaseModel):
 name: str
 email: str
 preferences: dict

# Create an Instructor client
client = instructor.from_openai(OpenAI())

# Call with automatic validation and retries
user = client.chat.completions.create(
 model="gpt-4",
 response_model=UserInfo,
 messages=[{"role": "user", "content": "Extract user info from the following text..."}]
)

# user is now a validated UserInfo instance with type safety
print(user.name, user.email)

Complex Schema Handling

Instructor handles deeply nested structures, discriminated unions, and custom validators that break simpler approaches. Consider extracting a complex document structure with multiple nested entities, each requiring different handling. Pydantic's model composition allows defining these relationships declaratively, while Instructor ensures the LLM produces data conforming to the entire structure.

from pydantic import BaseModel, Field
from typing import List
from enum import Enum

class Category(str, Enum):
 TECHNICAL = "technical"
 MARKETING = "marketing"
 SUPPORT = "support"

class TicketAnalysis(BaseModel):
 priority: int = Field(ge=1, le=5)
 category: Category
 sentiment: str
 suggested_response_length: str
 follow_up_required: bool

class TicketExtractor(BaseModel):
 ticket_id: str
 customer_name: str
 issues: List[str]
 analysis: TicketAnalysis
 confidence_score: float = Field(ge=0, le=1)

# Complex extraction with automatic validation
result = client.chat.completions.create(
 model="gpt-4",
 response_model=TicketExtractor,
 messages=[{"role": "user", "content": "Analyze this support ticket..."}]
)

Validation and Error Handling

Pydantic's validation provides detailed error messages when outputs don't conform to schemas, dramatically improving debugging compared to generic JSON parsing failures. Each field's type constraints, custom validators, and required/optional status contribute to precise error reporting.

Custom validators extend Pydantic's capabilities beyond basic type checking. You can enforce business logic, cross-field dependencies, and complex validation rules that the LLM must satisfy. When validation fails, the resulting error messages guide both debugging and potential prompt improvements. This detailed feedback loop enables rapid iteration on both prompts and schemas.

Zod for TypeScript Environments

Zod provides Pydantic-like capabilities for TypeScript, offering compile-time type inference from runtime validation schemas. Its declarative API makes schema definition natural while maintaining full type safety throughout your codebase.

For TypeScript applications building AI-powered search interfaces or multimodal AI applications, Zod provides the type safety guarantees needed for production deployments.

Building a Reliable JSON Parser

Combining Zod with recursive retry logic handles models without native structured output support while guaranteeing valid, schema-conforming results. The recursive retry mechanism catches validation errors and feeds them back to the model with instructions for correction.

Inferable provides detailed patterns for implementing reliable JSON parsers using Zod schemas and recursive retries for deterministic structured outputs.

import { z } from "zod";
import retry from "async-retry";

interface ParserOptions {
 maxRetries?: number;
 schema: z.ZodSchema;
 prompt: string;
}

async function callModel(prompt: string) {
 const response = await fetch("http://localhost:11434/api/generate", {
 method: "POST",
 headers: { "Content-Type": "application/json" },
 body: JSON.stringify({
 model: "llama3.2",
 prompt: prompt,
 stream: false,
 }),
 });
 const data = await response.json();
 return data.response;
}

export async function parseWithRetry({
 maxRetries = 3,
 schema,
 prompt,
}: ParserOptions) {
 return retry(
 async (bail, attempt) => {
 try {
 const fullPrompt =
 attempt === 1
 ? prompt
 : `${prompt}\n\nPrevious attempt failed. Please fix and try again.`;

 const response = await callModel(fullPrompt);

 // Extract JSON from response
 const jsonMatch = response.match(/\{[\s\S]*\}/);
 if (!jsonMatch) {
 throw new Error("No JSON found in response");
 }

 const parsed = JSON.parse(jsonMatch[0]);
 return schema.parse(parsed);
 } catch (error) {
 if (attempt === maxRetries) {
 bail(error as Error);
 return;
 }
 throw error;
 }
 },
 {
 retries: maxRetries,
 factor: 1,
 minTimeout: 100,
 maxTimeout: 1000,
 }
 );
}

Schema Definition with Zod

Zod's API enables intuitive schema definition mirroring Pydantic's capabilities. String validation, number constraints, array handling, and object shapes all have intuitive APIs. The infer utility extracts TypeScript types from schemas, eliminating duplication between validation and type definitions.

const MovieSchema = z.object({
 title: z.string(),
 year: z.number(),
 rating: z.number().min(0).max(10),
 genres: z.array(z.string()),
 director: z.object({
 name: z.string(),
 nationality: z.string().optional()
 }).optional()
});

// Type is inferred automatically - no need for separate type definition
type Movie = z.infer<typeof MovieSchema>;

Best Practices for Zod Schemas

Effective schemas balance strictness with flexibility. Overly permissive schemas miss validation issues; overly strict ones cause unnecessary retries. Focus on business-critical constraints while allowing reasonable formatting variation.

Using custom error messages improves the feedback loop when validation fails. Zod's .refine() and .transform() methods enable complex validation logic with meaningful error reporting. Consider using discriminators for union types to ensure clear type narrowing and improve validation accuracy for heterogeneous data structures.

Advanced Techniques and Production Patterns

Beyond basic implementations, advanced techniques significantly improve structured output reliability in production environments. These patterns address the remaining edge cases and maximize success rates across diverse inputs.

Structured outputs integrate tightly with vector databases when building RAG systems, ensuring that extracted metadata and document chunks maintain consistent formatting for efficient storage and retrieval.

Few-Shot Examples

Providing examples of valid JSON outputs dramatically improves accuracy for complex schemas. Inferable demonstrates that few-shot examples give the model concrete reference points for understanding expected structures, reducing ambiguity in schema interpretation.

Effective few-shot examples demonstrate the full range of expected outputs, including edge cases and boundary conditions. Include examples that show correct handling of optional fields, nested structures, and valid enum values.

prompt = """
Extract the product information in JSON format with the following structure:
- product_name (string)
- price (number, must be positive)
- category (one of: electronics, clothing, food, other)
- in_stock (boolean)

Return only valid JSON, no additional text.

Example of valid JSON:
{"product_name": "Wireless Headphones", "price": 149.99, "category": "electronics", "in_stock": true}

Example of invalid JSON (WRONG):
{"product_name": "Wireless Headphones", "price": -50, "category": "invalid", "in_stock": "yes"}

Now extract this product:
"""

Schema Decomposition

For extremely complex schemas, decomposing into smaller pieces and handling each separately reduces cognitive load on the model and provides clearer validation feedback. Consider a document processing system that extracts headers, sections, tables, and figures--pipeline each extraction separately with focused prompts.

Each stage produces partial results that assemble into the final complex structure, with easier validation at each step. This approach also enables parallel processing for independent extractions and provides better error isolation when things go wrong.

Response Prefilling for Guaranteed Format

Claude responds particularly well to response prefilling, where you include partial JSON in the conversation. This forces the model to complete rather than generate, significantly improving format consistency. The model must now complete the JSON structure rather than choosing its own format.

[
 {"role": "user", "content": "What is your favorite color? Output only JSON."},
 {"role": "assistant", "content": "{\"color\":\""}
]

This pattern works across many use cases where you can anticipate the beginning of valid output. The prefilled content acts as a template that constrains the model's response format.

Common Pitfalls and How to Avoid Them

Understanding common mistakes helps you build robust structured output systems from the start. These pitfalls affect projects across all maturity levels, and awareness prevents costly rework.

Over-Constraining Schemas

Extremely strict schemas with many required fields often lead to higher failure rates. When schemas require too many fields, the model struggles to satisfy all constraints simultaneously. Solution: Make fields optional where business requirements allow, or use default values for non-critical attributes. Start with minimal required fields and add constraints incrementally as you validate success rates.

Insufficient Error Handling

Production systems must handle validation failures gracefully. Simply logging errors and continuing is not enough--build proper retry logic, fallback mechanisms, and alerting for persistent failures. Solution: Implement circuit breakers that prevent cascade failures when validation error rates spike. Rate limiting protects against runaway costs during issues. Log structured outputs (with appropriate privacy considerations) to enable debugging and pattern analysis.

Neglecting Schema Evolution

Schemas change as applications evolve, but structured output pipelines often do not keep pace. Solution: Implement schema versioning and migration strategies. When schema changes, expect some period of adjustment as prompts and validation logic adapt to new requirements. Maintain backward compatibility where possible and provide clear migration paths.

Ignoring Prompt Sensitivity

Structured output quality often varies significantly with prompt wording. Small changes in how you describe schemas or structure instructions can dramatically impact success rates. Solution: Invest in prompt engineering and testing, treating prompts as code that requires review and iteration. A/B test different phrasings and document what works. Version control your prompts alongside schema definitions.

Implementing a Production Pipeline

Bringing structured outputs into production requires combining the techniques covered here into cohesive systems with proper observability. A well-designed pipeline handles the complete flow from input to validated output with proper error handling throughout.

Pipeline Architecture

Production pipelines typically include:

  1. Input preprocessing -- Normalize and validate inputs before they reach the model
  2. Prompt construction -- Build prompts with schema context and few-shot examples
  3. Model invocation -- Call the LLM with appropriate parameters and timeout handling
  4. Response parsing -- Extract JSON from the raw response
  5. Schema validation -- Validate against your Pydantic or Zod schema
  6. Error handling -- Retry, fallback, or escalate based on failure type

Monitoring and Observability

Track key metrics across your structured output pipeline:

MetricWhat to Monitor
Overall success ratePercentage of requests returning valid output
Schema-specific ratesBreak down by schema type
Validation error typesIdentify common failure patterns
Retry frequencyAre retries working effectively?
Latency distributionsP50, P95, P99 response times

Alert on significant changes in any metric. Sudden success rate drops might indicate model changes, schema drift, or new input patterns requiring attention.

Choosing Your Approach

ScenarioRecommended Approach
Simple schemas, minimal changesNative JSON mode
Python, strong type safetyInstructor with Pydantic
TypeScript, provider-agnosticZod with recursive retries
Claude, guaranteed formatResponse prefilling

The best approach often combines multiple techniques--using native JSON mode as a baseline while adding validation layers for critical applications. Start simple, measure results, and add complexity where metrics justify the investment. Consider integrating with your existing AI automation services for comprehensive pipeline management, and complement your setup with proper LLM security best practices to protect your data flows.

Frequently Asked Questions

Ready to Implement Structured AI Outputs?

We help businesses build reliable AI pipelines that produce predictable, type-safe data. From simple JSON extraction to complex multi-stage validation systems.

Sources

  1. Agenta AI - The Guide to Structured Outputs and Function Calling with LLMs - Comprehensive guide covering JSON mode, Pydantic, Instructor, and Outlines. Covers OpenAI, Claude, and Gemini code examples for consistent data extraction.
  2. F22 Labs - Why the Instructor Beats OpenAI for Structured JSON Output - In-depth comparison showing Instructor's Pydantic integration advantages over OpenAI's native structured output, particularly for complex nested schemas.
  3. Inferable - Implementing Structured Outputs as a Feature for Any LLM - Practical guide on building reliable JSON parsers using Zod schemas and recursive retries for deterministic structured outputs.