Every call to an LLM in Mellea passes through four layers: Component, Backend, Context, and Session. Understanding how these fit together explains both why Mellea is structured the way it is and how to extend it effectively.Documentation Index
Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Looking to use this in code? See Context and Sessions for practical examples and session extension patterns.
The four layers
Components
AComponent is the structured representation of a single interaction with an LLM.
When you call m.instruct(...), Mellea creates an Instruction component — a
composite data structure that holds the description, requirements, user variables,
grounding context, and ICL examples for that call.
Components are composable: a component can contain other components. This is how
Mellea keeps prompts modular. An Instruction contains Requirement objects;
a Requirement is itself a component. The composition forms a directed acyclic
graph (DAG) that the backend renders into a prompt.
The leaf nodes of the DAG are CBlock objects — atomic content blocks that hold
raw text or a parsed representation of a model output.
Backends
ABackend takes a Component, formats it into a prompt, sends it to an LLM, and
returns the model output as a ModelOutputThunk. The Thunk is a lazy wrapper: it
holds the raw model output and parses it on access (via .value or str()).
The backend is responsible for:
- Rendering the component tree into the prompt format the model expects (chat messages, template strings, etc.)
- Making the network or process call to the LLM
- Parsing the response into a typed representation where applicable
Component does not know which backend will render it.
Contexts
AContext records the history of interactions during a session. It is a linked
list (or tree, when you clone a session) of components and their outputs.
The context serves two purposes:
-
Prompt construction — the backend calls
ctx.view_for_generation()to get the components that should appear in the prompt. ForChatContext, this includes all prior turns. ForSimpleContext, it includes only the current instruction. -
Validation — during the IVR loop, requirement validators receive the
Contextobject. They can callctx.last_output()to inspect the most recent model output, or examine the full history for more complex checks.
Sessions
MelleaSession is the developer-facing layer. It wraps a backend and a context,
exposes the instruct(), chat(), validate(), and other methods you use in your
code, and handles the bookkeeping that ties components, context updates, and backend
calls together.
start_session() returns a MelleaSession with defaults: Ollama backend, Granite 4
Micro model, and SimpleContext.
SimpleContext vs ChatContext
The two built-in context types implement very different history policies.
SimpleContext
SimpleContext is stateless between calls. Each instruct() or chat() call sees
only the current instruction — no prior turns. The prompt is entirely determined by
the current component.
Use SimpleContext (the default) when:
- Calls are logically independent (a batch of classification tasks, extraction from different documents)
- You are composing
@generativefunctions whose results flow through Python code, not through chat history - You want predictable, isolated calls with no context accumulation
ChatContext
ChatContext preserves the full message history across calls. The model sees all
prior turns on every new request.
ChatContext when:
- You are building a stateful conversation (a chat assistant, an interactive planning session)
- The model needs to refer back to prior turns to give a coherent response
- You are implementing agentic loops where each step builds on previous results
The context window trade-off
ChatContext accumulates history indefinitely. As history grows, prompts become
larger, latency increases, and cost rises. For long sessions, consider using
ctx.reset_to_new() or m.reset() to clear history at a natural breakpoint.
The ChatContext constructor accepts a window_size parameter to limit how many
prior turns are retained:
SimpleContext (the default)
is the right choice. Reserve ChatContext for applications where conversational
coherence is genuinely required.
Why explicit context management matters
Implicit context — a global chat history that grows without bounds — is a common source of subtle failures in generative programs:- Prompt degradation: A very long history can cause the model to lose focus on the current instruction, producing outputs that drift from what was asked.
- Context window overflow: Every LLM has a maximum token budget. Exceeding it causes truncation or errors.
- Hard-to-debug behaviour: When context is implicit and global, it is hard to reproduce failures — the same instruction can produce different results depending on what happened earlier in the session.
SimpleContext ensures independence by default; ChatContext
is opt-in for cases where history is genuinely needed.
Session cloning
m.clone() creates a copy of a session at its current context state. Both the
original and the clone start from the same history and then diverge independently:
- Exploring multiple continuations of the same context (tree-structured reasoning)
- Running parallel comparisons with the same conversational history
- Implementing best-of-N sampling at the conversation level rather than the single-turn level
Inspecting context
Thectx object exposes helpers for reading the current session state:
last_turn() returns a ContextTurn with .input and .output fields. It is
useful for observability or when you need to log exactly what the model received and
produced.
Extending sessions
MelleaSession is a regular Python class. Subclassing it lets you inject custom
behaviour — input filtering, output validation, logging, rate limiting — into
every call. See Context and Sessions how-to
for a worked example.
See also: Context and Sessions how-to | Async and Streaming