mellea.backends.huggingface
A backend that uses the Huggingface Transformers library.
The purpose of the Hugginface backend is to provide a setting for implementing experimental features. If you want a performance local backend, and do not need experimental features such as Span-based context or ALoras, consider using Ollama backends instead.
Classes
HFAloraCacheInfo
A dataclass for holding some KV cache and associated information.
LocalHFBackend
The LocalHFBackend uses Huggingface’s transformers library for inference, and uses a Formatter to convert Component
s into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397).
This backend is designed for running an HF model for small-scale inference locally on your machine.
This backend is NOT designed for inference scaling on CUDA-enabled hardware.
Methods:
alora_model
alora_model
generate_from_context
processing
post_processing
cache_get
cache_put
add_alora
alora
: identifier for the ALora adapter
get_alora
get_aloras
HFAlora
ALoras that work with the local huggingface backend.
HFProcessRewardModel
A Process Reward Model that works with a huggingface backend.
Methods:
stepify
content
: assistant response to scorestep_separator
: string on which to separate the content into steps