Skip to main content

mellea.stdlib.docs.richdocument

Representations of Docling Documents.

Classes

RichDocument

A RichDocument is a block of content with an underlying DoclingDocument. It has helper functions for working with the document and extracting parts of it. Methods:

parts

parts(self) -> list[Component | CBlock]
A RichDocument has no parts.

format_for_llm

format_for_llm(self) -> TemplateRepresentation | str
Return Document content as Markdown. No template needed here.

docling

docling(self) -> DoclingDocument
Get the underlying Docling Document.

to_markdown

to_markdown(self)
Get the full text of the document as markdown.

get_tables

get_tables(self) -> list[Table]
Return the Tables that are a part of this document.

save

save(self, filename: str | Path) -> None
Save the underlying DoclingDocument for reuse later.

load

load(cls, filename: str | Path) -> RichDocument
Load a DoclingDocument from a file. The file must already be a DoclingDocument.

from_document_file

from_document_file(cls, source: str | Path | DocumentStream) -> RichDocument
Process a document with Docling.

TableQuery

Table-specific query. Methods:

format_for_llm

format_for_llm(self) -> TemplateRepresentation
Template arguments for Formatter.

TableTransform

Table-specific transform. Methods:

format_for_llm

format_for_llm(self) -> TemplateRepresentation
Template arguments for Formatter.

Table

A Table represents a single table within a larger Docling Document. Methods:

from_markdown

from_markdown(cls, md: str) -> Table | None
Creates a fake document from the markdown and attempts to extract the first table found.

to_markdown

to_markdown(self) -> str
Get the Table as markdown.

transpose

transpose(self) -> Table | None
Transposes the table. Will return a new transposed Table if successful.

format_for_llm

format_for_llm(self) -> TemplateRepresentation | str
Return Table representation for Formatter.
I