RichDocument, Table, and related helpers backed by Docling.
RichDocument wraps a DoclingDocument (e.g. produced by converting a PDF or
Markdown file) and renders it as Markdown for a language model. Table represents a
single table within a Docling document and provides transpose, to_markdown, and
query/transform helpers. Use RichDocument.from_document_file to convert a PDF or
other supported format, and get_tables() to extract structured table data for
downstream LLM-driven Q&A or transformation tasks.
Classes
CLASS RichDocument
A RichDocument is a block of content backed by a DoclingDocument.
Provides helper functions for working with the document and extracting parts
such as tables. Use from_document_file to convert PDFs or other formats,
and save/load for persistence.
Args:
doc: The underlying Docling document to wrap.
FUNC parts
- list[Component | CBlock]: Always an empty list.
FUNC format_for_llm
- TemplateRepresentation | str: The full document rendered as Markdown.
FUNC docling
DoclingDocument.
Returns:
- The wrapped Docling document instance.
FUNC to_markdown
FUNC get_tables
- list[Table]: A list of
Tableobjects extracted from the document.
FUNC save
DoclingDocument to a JSON file for later reuse.
Args:
filename: Destination file path for the serialized document.
FUNC load
RichDocument from a previously saved DoclingDocument JSON file.
Args:
filename: Path to a JSON file previously created byRichDocument.save.
- A new
RichDocumentwrapping the loaded document.
FUNC from_document_file
RichDocument using Docling.
Args:
source: Path or stream for the source document (e.g. a PDF or Markdown file).
- A new
RichDocumentwrapping the converted document.
CLASS TableQuery
A Query component specialised for Table objects.
Formats the table as Markdown alongside the query string so the LLM receives
both the structured table content and the natural-language question.
Args:
obj: The table to query.query: The natural-language question to ask about the table.
FUNC parts
- list[Component | CBlock]: A list containing the wrapped
Table - object.
FUNC format_for_llm
- Template args containing the query string
- and the Markdown-rendered table.
CLASS TableTransform
A Transform component specialised for Table objects.
Formats the table as Markdown alongside the transformation instruction so the
LLM receives both the structured table content and the mutation description.
Args:
obj: The table to transform.transformation: Natural-language description of the desired mutation.
FUNC parts
- list[Component | CBlock]: A list containing the wrapped
Table - object.
FUNC format_for_llm
- Template args containing the transformation
- description and the Markdown-rendered table.
CLASS Table
A Table represents a single table within a larger Docling Document.
Args:
ti: The DoclingTableItemextracted from the document.doc: The parentDoclingDocument. PassingNonemay cause downstream Docling functions to fail.
FUNC from_markdown
Table from a Markdown string by round-tripping through Docling.
Wraps the Markdown in a minimal document, converts it with Docling, and
returns the first table found.
Args:
md: A Markdown string containing at least one table.
- Table | None: The first
Tableextracted from the Markdown, or Noneif no table could be found.
FUNC parts
format_for_llm.
Returns:
- list[Component | CBlock]: Always an empty list.
FUNC to_markdown
- The Markdown representation of this table.
FUNC transpose
Table.
Returns:
- Table | None: A new transposed
Table, orNoneif the - transposed Markdown cannot be parsed back into a
Table.
FUNC format_for_llm
- TemplateRepresentation | str: A
[TemplateRepresentation](../../../core/base#class-templaterepresentation)that - renders the table as its Markdown string using a
\{\{table\}\} - template.