mellea.backends.kv_block_helpers

Low-level utilities for concatenating transformer KV caches (KV smashing). Provides functions for merging DynamicCache and legacy tuple caches along the time axis (merge_dynamic_caches, legacy_cache_smash), and tokens_to_legacy_cache for converting a tokenized prompt into a prefilled KV cache. These helpers are used internally by local HuggingFace backends that reuse cached prefix computations across multiple generation calls.

Functions

FUNC `legacy_cache_smash`

legacy_cache_smash(a: LegacyCache, b: LegacyCache) -> LegacyCache

Concatenates two LegacyCache Ks and Vs along the time axis. Args:

a: First legacy KV cache (tuple of per-layer (K, V) tensor pairs).
b: Second legacy KV cache to concatenate after a.

Returns:

New legacy cache with b appended to a along the sequence dimension.

FUNC `merge_dynamic_caches`

merge_dynamic_caches(caches: Iterable[DynamicCache]) -> DynamicCache

Merges two DynamicCache Ks and Vs along the time axis. Args:

caches: Iterable of DynamicCache objects to merge in order.

Returns:

A single DynamicCache with all caches concatenated along the sequence dimension.

FUNC `tokens_to_legacy_cache`

tokens_to_legacy_cache(model: PreTrainedModel, device: str, tokens_or_cache: BatchEncoding | DynamicCache) -> Iterable[LegacyCache]

Prefills and returns Ks and Vs as a LegacyCache. Args:

model: The HuggingFace model used for prefill.
device: Target device string (e.g. "cuda", "cpu").
tokens_or_cache: Either a BatchEncoding to prefill, or an existing DynamicCache to convert directly.

Returns:

Legacy KV cache representation as a tuple of per-layer (K, V) tensor pairs.

mellea

cli

​Functions

​FUNC legacy_cache_smash

​FUNC merge_dynamic_caches

​FUNC tokens_to_legacy_cache

Functions

FUNC `legacy_cache_smash`

FUNC `merge_dynamic_caches`

FUNC `tokens_to_legacy_cache`