Skip to main content
A model backend wrapping the Ollama Python SDK.

Functions

FUNC chat_response_delta_merge

chat_response_delta_merge(mot: ModelOutputThunk, delta: ollama.ChatResponse)
Merges the individual ChatResponse chunks from a streaming response into a single ChatResponse. Args:
  • mot: the ModelOutputThunk that the deltas are being used to populated.
  • delta: the most recent ollama ChatResponse.

Classes

CLASS OllamaModelBackend

A model that uses the Ollama Python SDK for local inference. Args:
  • model_id: Ollama model ID. If a [ModelIdentifier](model_ids#class-modelidentifier) is passed, its ollama_name attribute must be set.
  • formatter: Formatter for rendering components. Defaults to [TemplateFormatter](../formatters/template_formatter#class-templateformatter).
  • base_url: Ollama server endpoint; defaults to env(OLLAMA_HOST) or http\://localhost\:11434.
  • model_options: Default model options for generation requests.
Attributes:
  • to_mellea_model_opts_map: Mapping from Ollama-specific option names to Mellea [ModelOption](model_options#class-modeloption) sentinel keys.
  • from_mellea_model_opts_map: Mapping from Mellea [ModelOption](model_options#class-modeloption) sentinel keys to Ollama-specific option names.
Methods:

FUNC is_model_available

is_model_available(self, model_name)
Checks if a specific Ollama model is available locally. Args:
  • model_name: The name of the model to check for (e.g., “llama2”).
Returns:
  • True if the model is available, False otherwise.

FUNC generate_from_chat_context

generate_from_chat_context(self, action: Component[C] | CBlock, ctx: Context) -> ModelOutputThunk[C]
Generate a new completion from the provided context using this backend’s formatter. Treats the [Context](../core/base#class-context) as a chat history and uses the ollama.Client.chat() interface to generate a completion. Returns a thunk that lazily resolves the model output. Args:
  • action: The component or content block to generate a completion for.
  • ctx: The current generation context (must be a chat context).
  • _format: Optional Pydantic model class for structured output decoding.
  • model_options: Per-call model options.
  • tool_calls: If True, expose available tools and parse responses.
Returns:
  • ModelOutputThunk[C]: A thunk holding the (lazy) model output.
Raises:
  • RuntimeError: If not called from a thread with a running event loop.

FUNC generate_from_raw

generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]]

FUNC generate_from_raw

generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]]

FUNC generate_from_raw

generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk]
Generate completions for multiple actions without chat templating via Ollama. Passes formatted prompt strings directly to Ollama’s generate endpoint. Requests are submitted concurrently to make use of Ollama’s concurrency support. Args:
  • actions: Actions to generate completions for.
  • ctx: The current generation context.
  • format: Optional Pydantic model for structured output decoding.
  • model_options: Per-call model options.
  • tool_calls: Ignored; tool calling is not supported on this endpoint.
Returns:
  • list[ModelOutputThunk]: A list of model output thunks, one per action.

FUNC processing

processing(self, mot: ModelOutputThunk, chunk: ollama.ChatResponse, tools: dict[str, AbstractMelleaTool])
Accumulate text and tool calls from a single Ollama ChatResponse chunk. Called for each streaming or non-streaming ollama.ChatResponse. Also extracts tool call requests inline and merges the chunk into the running aggregated response stored in mot._meta["chat_response"]. Args:
  • mot: The output thunk being populated.
  • chunk: A single chat response object from Ollama.
  • tools: Available tools, keyed by name, used for extracting tool call requests from the response.

FUNC post_processing

post_processing(self, mot: ModelOutputThunk, conversation: list[dict], tools: dict[str, AbstractMelleaTool], _format)
Finalize the output thunk after Ollama generation completes. Attaches the generate log, records token usage metrics, emits telemetry, and cleans up the span reference. Args:
  • mot: The output thunk to finalize.
  • conversation: The chat conversation sent to the model, used for logging.
  • tools: Available tools, keyed by name.
  • _format: The structured output format class used during generation, if any.