Skip to main content

mellea.core.requirement

Requirement interface for constrained and validated generation.

A Requirement pairs a human-readable description with a validation function that inspects a Context (and optionally a backend) to determine whether a model output meets a constraint. ValidationResult carries the pass/fail verdict along with an optional reason, score, and the ModelOutputThunk produced during validation. PartialValidationResult provides a tri-state variant ("pass", "fail", "unknown") for per-chunk streaming validation. Helper factories such as default_output_to_bool make it easy to build requirements without boilerplate.

Functions

FUNC default_output_to_bool

default_output_to_bool(x: CBlock | str) -> bool

Convert a model output string to a boolean by checking for a "yes" answer.

Checks if the output is exactly equal to "yes" or "y" (case-insensitive). If not, also checks if any of the words in the output are "yes" (case-insensitive).

Args:

  • x: The model output to evaluate, as a CBlock or plain string.

Returns:

  • True if the output indicates a "yes" answer, False otherwise.

Classes

CLASS ValidationResult

ValidationResults store the output of a Requirement's validation. They can be used to return additional info from validation functions, which is useful for sampling/repairing.

Args:

  • result: Boolean indicating whether the requirement passed.
  • reason: Optional human-readable explanation for the verdict.
  • score: Optional numeric score returned by the validator.
  • thunk: The ModelOutputThunk produced during LLM-as-a-Judge validation, if applicable.
  • context: The context associated with the validation backend call, if applicable.

Methods:

FUNC reason

reason(self) -> str | None

Reason for the validation result.

FUNC score

score(self) -> float | None

An optional score for the validation result.

FUNC thunk

thunk(self) -> ModelOutputThunk | None

The ModelOutputThunk associated with the validation func if an llm was used to generate the final result.

FUNC context

context(self) -> Context | None

The context associated with validation if a backend was used to generate the final result.

FUNC as_bool

as_bool(self) -> bool

Return a boolean value based on the validation result.

Returns:

  • True if the requirement passed, False otherwise.

CLASS PartialValidationResult

Tri-state result from per-chunk streaming validation.

Unlike :class:ValidationResult, which stores its verdict as a private _result: bool, this class exposes success as a public property. The asymmetry is intentional: the tri-state value cannot be recovered from a bool, so a public property is the only way to distinguish "fail" from "unknown" after construction.

Args:

  • success: Validation state — "pass" (constraint satisfied so far), "fail" (constraint violated, stop streaming), or "unknown" (insufficient data yet, continue streaming).
  • reason: Optional human-readable explanation.
  • score: Optional numeric confidence score.
  • thunk: Optional ModelOutputThunk from the validation call.
  • context: Optional context associated with the validation call.

Methods:

FUNC success

success(self) -> Literal['pass', 'fail', 'unknown']

The tri-state validation result.

FUNC reason

reason(self) -> str | None

Reason for the validation result.

FUNC score

score(self) -> float | None

An optional score for the validation result.

FUNC thunk

thunk(self) -> ModelOutputThunk | None

The ModelOutputThunk associated with the validation call, if any.

FUNC context

context(self) -> Context | None

The context associated with the validation call, if any.

FUNC as_bool

as_bool(self) -> bool

Return True for "pass", False for "fail" or "unknown".

"unknown" maps to False intentionally. In streaming contexts, check pvr.success == "unknown" before treating False as a definitive failure — "unknown" means insufficient data so far, not a constraint violation.

Returns:

  • True if the partial result is "pass", False otherwise.

CLASS Requirement

Requirements are a special type of Component used as input to the Validate step in Instruct/Validate/Repair patterns.

Args:

  • description: A natural-language description of the requirement. Sometimes included in Instruction prompts; use check_only=True to suppress this.
  • validation_fn: If provided, this function is executed instead of LLM-as-a-Judge. The bool() of its return value defines pass/fail.
  • output_to_bool: Translates LLM-as-a-Judge output to a boolean. Defaults to a "yes"-detection heuristic.
  • check_only: When True, the requirement description is excluded from Instruction prompts.

Attributes:

  • description: A natural-language description of the requirement.
  • output_to_bool: Function used to convert LLM-as-a-Judge output into a boolean pass/fail result.
  • validation_fn: Optional custom validation function that bypasses the LLM-as-a-Judge strategy entirely.
  • check_only: When True, the requirement description is excluded from Instruction prompts to avoid influencing model output.

Methods:

FUNC validate

validate(self, backend: Backend, ctx: Context) -> ValidationResult

Chooses the appropriate validation strategy and applies it to the given context.

Uses validation_fn if one was provided, otherwise falls back to LLM-as-a-Judge by generating a judgement response with the backend.

Args:

  • backend: The inference backend used when the LLM-as-a-Judge strategy is selected.
  • ctx: The context to validate, which must contain a ModelOutputThunk as its last output.
  • format: Optional structured output format for the judgement call.
  • model_options: Optional model options to pass to the backend during the judgement call.

Returns:

  • The result of the validation, including a boolean pass/fail and optional metadata.

FUNC stream_validate

stream_validate(self, chunk: str) -> PartialValidationResult

Hook for per-chunk streaming validation.

The default implementation returns PartialValidationResult("unknown") — meaning insufficient data to decide yet. Subclasses override this method to inspect the current chunk and return "pass" or "fail" early.

Implementations may accumulate state on self across calls within a single attempt. The orchestrator clones the requirement (copy(req)) before each attempt, so state does not bleed across retries.

Shallow-copy caveat: mutable container fields (e.g. self._buffer = []) are shared by reference under copy(). Reassign rather than mutate in place (self._buffer = self._buffer + [chunk], not self._buffer.append(chunk)), or override __copy__ for proper isolation. If an override raises, the enclosing :func:~mellea.stdlib.streaming.stream_with_chunking call aborts before any backend generation starts and the exception propagates unchanged. Overrides with externally visible side effects (file writes, network calls) should perform them only after any logic that could raise, since the framework cannot roll them back.

Implementations must not call mot.astream() or otherwise read the underlying stream; the orchestrator is the single consumer of the MOT stream (see ModelOutputThunk.astream). Requirements that need access to the text seen so far should accumulate it themselves from the chunk values they receive.

Args:

  • chunk: A single complete semantic chunk produced by the chunking strategy (e.g. one sentence for SentenceChunker). This is the delta since the previous stream_validate call for this attempt, not the accumulated output. Requirements that need earlier context should retain it on self across calls.
  • backend: The inference backend, available for backend-assisted checks.
  • ctx: The current generation context. During streaming the MOT is not yet computed, so ctx does not contain the generated output; use chunk (and any state accumulated on self) instead.

Returns:

  • "unknown" by default. Subclasses may return
  • "pass" (constraint satisfied so far) or "fail" (constraint violated,
  • streaming should be aborted). "pass" does not short-circuit the final
  • validate() call; the orchestrator decides whether to skip it.

FUNC parts

parts(self) -> list[Component | CBlock]

Returns all of the constituent parts of a Requirement.

Returns:

  • List of constituent components. Empty by default; subclasses override
  • to expose their internal structure.

FUNC format_for_llm

format_for_llm(self) -> TemplateRepresentation | str

Returns a TemplateRepresentation for LLM-as-a-Judge evaluation of this requirement.

Populates the template with the requirement's description and the stored model _output. Must only be called from within a validate call for this same requirement, after _output has been set.

Returns:

  • TemplateRepresentation | str: A TemplateRepresentation containing the description
  • and the model output to be judged.