think_budget_forcing, which extends a model’s reasoning phase by
repeatedly appending a “think more” suffix whenever the model attempts to close its
<think> block prematurely, following the method proposed in arXiv:2501.19393.
Generation is split into a thinking pass (bounded by think_max_tokens) and an
answer pass (bounded by answer_max_tokens), using the raw completions API of an
OllamaModelBackend.
Functions
FUNC think_budget_forcing
backend: OllamaModelBackend instance to use for generation.action: The last item of the context, passed as anactioninstead of as part of thectx. Seedocs/dev/generate_signature_decisions.md.ctx: The current conversation context.format: Optional Pydantic model for constrained decoding of the response.tool_calls: IfTrue, tool calling is enabled.think_max_tokens: Budget in number of tokens allocated for the think block.answer_max_tokens: Budget in number of tokens allocated for the summary and answer block;Noneindicates unbounded answer, generating till EoS.start_think_token: String indicating start of think block, default<think>.end_think_token: String indicating end of think block, default</think>.begin_response_token: Used by certain models, string indicating start of response block, e.g."<response>", default"".think_more_suffix: String to append to force continued thinking, e.g."\nWait"; ifNone, additional thinking is not forced (upper-bound budget case).answer_suffix: String to append to force a final answer.model_options: Any model options to upsert into the defaults for this call.
- The assembled thinking and answer response.
Exception: If the backend returns generation results without the requiredmetainformation (e.g. token usage counts).