mellea.backends.aloras
Abstract interfaces for Backends that implement Activated LoRAs.
Classes
Alora
Activated LoRAs Aloras are are low-rank adapters that can reuse KV cache from their underlying model.
This class should not be directly subclassed by a specific ALora. Each backend that supports ALora should provide a backend-specific abstract class that subclasses ALora
. Individual ALoras should then be defined by subclassing the model-specific backend.
ALoras are always attached to an underlying model and use the following calling convention:
- The underlying model is prompted (without the Alora active). We call this the
input
. - The underlying model generates some tokens from the
input
context (again, without the ALora active). We call this theresponse
. - Then the adapter is activated and generates some tokens. We call then the
alora_response
.
name
: An arbitrary name/label in the model serving engine (e.g. vllm, or local huggingface) to assign to an ALora. This is irrelevant from the alora’s (huggingface) model id.