Skip to main content

mellea.backends.huggingface

A backend that uses the Huggingface Transformers library. The purpose of the Hugginface backend is to provide a setting for implementing experimental features. If you want a performance local backend, and do not need experimental features such as Span-based context or ALoras, consider using Ollama backends instead.

Classes

HFAloraCacheInfo

A dataclass for holding some KV cache and associated information.

LocalHFBackend

The LocalHFBackend uses Huggingface’s transformers library for inference, and uses a Formatter to convert Components into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397). This backend is designed for running an HF model for small-scale inference locally on your machine. This backend is NOT designed for inference scaling on CUDA-enabled hardware. Methods:

alora_model

alora_model(self) -> 'aLoRAPeftModelForCausalLM | None'
The ALora model.

alora_model

alora_model(self, model: 'aLoRAPeftModelForCausalLM | None')
Sets the ALora model. This should only happen once in a backend’s lifetime.

generate_from_context

generate_from_context(self, action: Component | CBlock, ctx: Context)
Generate using the huggingface model.

processing

processing(self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids)
Process the returned chunks or the complete response.

post_processing

post_processing(self, mot: ModelOutputThunk, conversation: list[dict], format: type[BaseModelSubclass] | None, tool_calls: bool, tools: dict[str, Callable], seed, input_ids)
Called when generation is done.

cache_get

cache_get(self, id: str) -> HFAloraCacheInfo | None
Retrieve from cache.

cache_put

cache_put(self, id: str, v: HFAloraCacheInfo)
Put into cache.

add_alora

add_alora(self, alora: HFAlora)
Loads an ALora for this backend. Args:
  • alora: identifier for the ALora adapter

get_alora

get_alora(self, alora_name: str) -> Alora | None
Returns the ALora by name, or None if that ALora isn’t loaded.

get_aloras

get_aloras(self) -> list[Alora]
Returns a list of all loaded ALora adapters.

HFAlora

ALoras that work with the local huggingface backend.

HFProcessRewardModel

A Process Reward Model that works with a huggingface backend. Methods:

stepify

stepify(self, content: str, step_separator: str) -> list[str]
Splits the assistant response into steps to score. Args:
  • content: assistant response to score
  • step_separator: string on which to separate the content into steps
I