ChunkingStrategy ABC and built-in implementations for streaming validation.Documentation Index
Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Classes
CLASS ChunkingStrategy
Abstract base class for text chunking strategies used in streaming validation.
A chunking strategy receives the full accumulated text so far and returns a
list of complete chunks ready for downstream validation. Any trailing fragment
that has not yet reached a chunk boundary is withheld — it is not included in
the returned list. Each call is stateless and idempotent given the same input.
End-of-stream contract: split() always withholds the trailing fragment.
When the stream terminates, callers are responsible for processing any remainder:
take the full accumulated text, identify everything after the last returned
chunk boundary, and handle it appropriately (e.g. pass to a final validator
or discard).
Methods:
FUNC split
accumulated_text: The full text accumulated so far, including all previously seen tokens and the latest delta.
- A list of complete chunks. If no chunk boundary has been reached yet,
- returns an empty list. Never includes the trailing incomplete fragment.
CLASS SentenceChunker
Splits accumulated text on sentence boundaries.
Sentence boundaries are detected by ., !, or ?, optionally
followed by a closing quote (straight or curly) or parenthesis, then
whitespace. The final sentence is only returned once it is followed by
whitespace or another sentence — a trailing fragment with no following
whitespace is withheld. Abbreviations are a known edge case: they will
be split on (simple regex, not NLP). Inter-sentence whitespace (including
double-space or tab) is discarded and does not appear as leading whitespace
in subsequent chunks.
Methods:
FUNC split
accumulated_text: The full text accumulated so far.
- Complete sentences detected so far. The trailing fragment (if any)
- is withheld.
CLASS WordChunker
Splits accumulated text on whitespace boundaries.
Each word is a chunk. Trailing text not yet followed by whitespace is
withheld.
Methods:
FUNC split
accumulated_text: The full text accumulated so far.
- All whitespace-delimited words except the trailing fragment (if any).
- An empty list is returned when no whitespace boundary has been seen.
CLASS ParagraphChunker
Splits accumulated text on double-newline paragraph boundaries.
Two or more consecutive newline characters are treated as a paragraph
separator. The trailing paragraph fragment (text not yet followed by \n\n)
is withheld.
Note: only Unix-style \n\n separators are recognised. CRLF
(\r\n\r\n) paragraph separators are not supported.
Methods:
FUNC split
accumulated_text: The full text accumulated so far.
- Complete paragraphs (separated by two or more newlines). The
- trailing incomplete paragraph is withheld. Returns an empty list
- if no paragraph boundary has been reached.