Skip to main content
Low-level utilities for concatenating transformer KV caches (KV smashing). Provides functions for merging DynamicCache and legacy tuple caches along the time axis (merge_dynamic_caches, legacy_cache_smash), and tokens_to_legacy_cache for converting a tokenized prompt into a prefilled KV cache. These helpers are used internally by local HuggingFace backends that reuse cached prefix computations across multiple generation calls.

Functions

FUNC legacy_cache_smash

legacy_cache_smash(a: LegacyCache, b: LegacyCache) -> LegacyCache
Concatenates two LegacyCache Ks and Vs along the time axis. Args:
  • a: First legacy KV cache (tuple of per-layer (K, V) tensor pairs).
  • b: Second legacy KV cache to concatenate after a.
Returns:
  • New legacy cache with b appended to a along the sequence dimension.

FUNC merge_dynamic_caches

merge_dynamic_caches(caches: Iterable[DynamicCache]) -> DynamicCache
Merges two DynamicCache Ks and Vs along the time axis. Args:
  • caches: Iterable of DynamicCache objects to merge in order.
Returns:
  • A single DynamicCache with all caches concatenated along the sequence dimension.

FUNC tokens_to_legacy_cache

tokens_to_legacy_cache(model: PreTrainedModel, device: str, tokens_or_cache: BatchEncoding | DynamicCache) -> Iterable[LegacyCache]
Prefills and returns Ks and Vs as a LegacyCache. Args:
  • model: The HuggingFace model used for prefill.
  • device: Target device string (e.g. "cuda", "cpu").
  • tokens_or_cache: Either a BatchEncoding to prefill, or an existing DynamicCache to convert directly.
Returns:
  • Legacy KV cache representation as a tuple of per-layer (K, V) tensor pairs.