Skip to main content
Cache abstractions and implementations for model state. Defines the abstract Cache interface with put, get, and current_size methods, and provides a concrete SimpleLRUCache that evicts the least-recently-used entry when capacity is exceeded — optionally calling an on_evict callback (e.g. to free GPU memory). Used by local HuggingFace backends to store and reuse KV cache state across requests.

Classes

CLASS Cache

A Cache for storing model state (e.g., kv cache).
Methods:

FUNC put

put(self, key: str | int, value: Any) -> None
Insert a value into the cache under the given key. May trigger eviction of existing entries if the cache is at capacity. Args:
  • key: The cache key to store the value under.
  • value: The value to store.

FUNC get

get(self, key: str | int) -> Any | None
Retrieve a value from the cache by key. May affect which entries are considered for future eviction (e.g. LRU ordering). Args:
  • key: The cache key to look up.
Returns:
  • Any | None: The cached value, or None if key has no cached entry.

FUNC current_size

current_size(self) -> int
Return the number of entries currently stored in the cache. Returns:
  • Count of items currently held in the cache; useful for debugging.

CLASS SimpleLRUCache

A simple LRU <https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_Recently_Used_(LRU)>_ cache. Evicts the least-recently-used entry when capacity is exceeded, optionally invoking an on_evict callback (e.g. to free GPU memory). Used by local HuggingFace backends to store and reuse KV cache state across requests. Args:
  • capacity: Maximum number of items to store in the cache.
  • on_evict: Optional callback invoked with the evicted value whenever an entry is removed to make room for a new one.
Attributes:
  • cache: Internal ordered dict used for LRU tracking; always initialised empty at construction.
Methods:

FUNC current_size

current_size(self) -> int
Return the number of entries currently stored in the cache. Returns:
  • Count of items currently held in the cache; useful for debugging.

FUNC get

get(self, key: str | int) -> Any | None
Retrieve a value from the cache, promoting it to most-recently-used. Args:
  • key: The cache key to look up.
Returns:
  • Any | None: The cached value, or None if key is not present.

FUNC put

put(self, key: str | int, value: Any) -> None
Insert or update a value in the cache. If the cache is at capacity and the key is new, the least-recently-used entry is evicted first, invoking the on_evict callback if set. Args:
  • key: The cache key to store the value under.
  • value: The value to cache.