Skip to main content
Mellea mascot

Mellea helps you manage the unreliable part of every AI-powered pipeline: the LLM call itself. It replaces ad-hoc prompt chains and brittle agents with structured generative programs — Python code where LLM calls are first-class operations governed by type annotations, requirement verifiers, and principled repair loops.

uv pip install mellea

Get started

Install Mellea and run your first generative program in minutes.

Tutorial

Build a complete program with generation, validation, and repair.

Code examples

Runnable examples: RAG, agents, sampling, MObjects, and more.

API reference

Full public API — backends, session, components, requirements, sampling.

How Mellea works

Mellea’s design rests on three interlocking ideas.

Python, not prose

@generative turns a typed function signature into an LLM-backed implementation. Docstrings become prompts. Type hints become output schemas. No DSL required.

Requirements driven

Declare what good output looks like with req(). Mellea checks every response before it leaves the session — using LLM verifiers, programmatic checks, or domain-trained adapters.

Instruct · Validate · Repair

When a requirement fails, Mellea feeds the failure back and tries again. Rejection sampling, majority voting, and SOFAI are built in.

Key patterns

MObjects and mify

Add @mify to any class to make it LLM-queryable and tool-accessible without rewriting your data model.

Context and sessions

Explicit context threading with push/pop state keeps multi-turn workflows reproducible and debuggable.

Async and streaming

ainstruct(), aact(), and token-by-token streaming for production throughput and responsive UIs.

Safety checks

GuardianCheck detects harmful, off-topic, or hallucinated outputs before they reach downstream code.

Inference-time scaling

Best-of-n, SOFAI, majority voting — swap strategies in one line.

Tools and agents

@tool, MelleaTool, and the ReACT loop for goal-driven multi-step agents.

Backends

Mellea is backend-agnostic. The same program runs on any inference engine.

Ollama

Local inference, zero cloud costs.

OpenAI

GPT-4o, o3-mini, any OpenAI-compatible API.

AWS Bedrock

AWS Bedrock via Bedrock Mantle or LiteLLM.

IBM WatsonX

IBM WatsonX managed AI platform.

HuggingFace

Local inference with Transformers — aLoRA and constrained decoding.

vLLM

High-throughput batched local inference on Linux + CUDA.

LiteLLM / Vertex AI

Google Vertex AI, Anthropic, and 100+ providers via LiteLLM.

LangChain

Use LangChain tools in Mellea sessions or call Mellea from LangChain chains.
See Backends and configuration for the full list of supported backends and how to configure them.

How-to guides

Enforce structured output

Pydantic models, Literal types, and @generative for guaranteed schemas.

Write custom verifiers

Python functions, ValidationResult, and multi-field validation logic.

Async and streaming

aact(), ainstruct(), and token-by-token streaming output.

Use context and sessions

ChatContext, explicit context threading, and multi-session workflows.

Configure model options

Temperature, seed, max tokens, system prompts — cross-backend with ModelOption.

Use images and vision

Pass images to instruct() and chat() with any vision-capable backend.

Build a RAG pipeline

Vector search, LLM relevance filtering, and grounded generation end-to-end.

GitHub · PyPI · Discussions