DynamicCache and legacy tuple caches along the
time axis (merge_dynamic_caches, legacy_cache_smash), and
tokens_to_legacy_cache for converting a tokenized prompt into a prefilled KV
cache. These helpers are used internally by local HuggingFace backends that reuse
cached prefix computations across multiple generation calls.
Functions
FUNC legacy_cache_smash
a: First legacy KV cache (tuple of per-layer (K, V) tensor pairs).b: Second legacy KV cache to concatenate aftera.
- New legacy cache with
bappended toaalong the sequence dimension.
FUNC merge_dynamic_caches
caches: Iterable ofDynamicCacheobjects to merge in order.
- A single
DynamicCachewith all caches concatenated along the sequence dimension.
FUNC tokens_to_legacy_cache
model: The HuggingFace model used for prefill.device: Target device string (e.g."cuda","cpu").tokens_or_cache: Either aBatchEncodingto prefill, or an existingDynamicCacheto convert directly.
- Legacy KV cache representation as a tuple of per-layer (K, V) tensor pairs.