Cache interface with put, get, and
current_size methods, and provides a concrete SimpleLRUCache that evicts
the least-recently-used entry when capacity is exceeded — optionally calling an
on_evict callback (e.g. to free GPU memory). Used by local HuggingFace backends
to store and reuse KV cache state across requests.
Classes
CLASS Cache
A Cache for storing model state (e.g., kv cache).
Methods:
FUNC put
key: The cache key to store the value under.value: The value to store.
FUNC get
key: The cache key to look up.
- Any | None: The cached value, or
Noneifkeyhas no cached entry.
FUNC current_size
- Count of items currently held in the cache; useful for debugging.
CLASS SimpleLRUCache
A simple LRU <https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_(LRU)>_ cache.
Evicts the least-recently-used entry when capacity is exceeded, optionally
invoking an on_evict callback (e.g. to free GPU memory). Used by local
HuggingFace backends to store and reuse KV cache state across requests.
Args:
capacity: Maximum number of items to store in the cache.on_evict: Optional callback invoked with the evicted value whenever an entry is removed to make room for a new one.
cache: Internal ordered dict used for LRU tracking; always initialised empty at construction.
FUNC current_size
- Count of items currently held in the cache; useful for debugging.
FUNC get
key: The cache key to look up.
- Any | None: The cached value, or
Noneifkeyis not present.
FUNC put
on_evict callback if set.
Args:
key: The cache key to store the value under.value: The value to cache.