Async and Streaming

Prerequisites: Quick Start complete, pip install mellea, Ollama running locally.

Async methods

Every sync method on MelleaSession has an a-prefixed async counterpart with the same signature and return type:

Sync	Async
`instruct()`	`ainstruct()`
`chat()`	`achat()`
`act()`	`aact()`
`validate()`	`avalidate()`
`query()`	`aquery()`
`transform()`	`atransform()`

import asyncio
import mellea

async def main():
    m = mellea.start_session()
    result = await m.ainstruct("Write a haiku about concurrency.")
    print(str(result))
    # Output will vary — LLM responses depend on model and temperature.

asyncio.run(main())

Parallel generation

ainstruct() returns a ModelOutputThunk immediately — generation starts in the background but the value is not resolved until you call avalue(). This lets you fire multiple generations and resolve them all at once:

import asyncio
import mellea

async def main():
    m = mellea.start_session()

    # Fire off all three — generation starts for each immediately
    thunk_a = await m.ainstruct("Write a poem about mountains.")
    thunk_b = await m.ainstruct("Write a poem about rivers.")
    thunk_c = await m.ainstruct("Write a poem about forests.")

    # None are resolved yet
    print(thunk_a.is_computed())  # False

    # Resolve all in parallel
    await asyncio.gather(
        thunk_a.avalue(),
        thunk_b.avalue(),
        thunk_c.avalue(),
    )

    print(thunk_a.value)
    print(thunk_b.value)
    print(thunk_c.value)
    # Output will vary — LLM responses depend on model and temperature.

asyncio.run(main())

For a list of thunks, wait_for_all_mots is a convenience wrapper:

import asyncio
import mellea
from mellea.helpers.async_helpers import wait_for_all_mots

async def main():
    m = mellea.start_session()

    thunks = []
    for topic in ["mountains", "rivers", "forests"]:
        thunks.append(await m.ainstruct(f"Write a short poem about {topic}."))

    await wait_for_all_mots(thunks)

    for t in thunks:
        print(t.value)
    # Output will vary — LLM responses depend on model and temperature.

asyncio.run(main())

Note: All thunks passed to wait_for_all_mots must belong to the same event loop, which is always the case when using MelleaSession.

Streaming

Enable streaming by passing ModelOption.STREAM: True in model_options. Consume incremental output chunks with mot.astream():

import asyncio
import mellea
from mellea.backends import ModelOption

async def main():
    m = mellea.start_session()
    mot = await m.ainstruct(
        "Write a short story about a robot learning to cook.",
        model_options={ModelOption.STREAM: True},
    )

    # Consume chunks as they arrive
    while not mot.is_computed():
        chunk = await mot.astream()
        print(chunk, end="", flush=True)

    print()  # newline after streaming completes

asyncio.run(main())
# Output will vary — LLM responses depend on model and temperature.

How astream() behaves:

Each call returns only the new content since the previous call.
When the thunk is fully computed (is_computed() returns True), the final astream() call returns the complete value.
If the thunk is already computed, astream() returns the full value immediately.

Warning: Do not call astream() from multiple coroutines simultaneously on the same thunk. Each thunk should have a single reader.

Async and context

Use SimpleContext (the default) with concurrent async requests. Using ChatContext with concurrent requests can cause stale context issues — Mellea logs a warning when this is detected:

WARNING: Not using a SimpleContext with asynchronous requests could cause
unexpected results due to stale contexts. Ensure you await between requests.

If you need ChatContext with async, await each call before starting the next:

import asyncio
import mellea
from mellea.stdlib.context import ChatContext

async def sequential_chat():
    m = mellea.start_session(ctx=ChatContext())
    r1 = await m.achat("Hello.")
    r2 = await m.achat("Tell me more.")  # safe — r1 is fully resolved
    print(str(r2))
    # Output will vary — LLM responses depend on model and temperature.

asyncio.run(sequential_chat())

For parallel generation, use SimpleContext.

Getting Started

Tutorials

Concepts

How-To

Examples

Integrations

Evaluation and Observability

Advanced

Community

Reference

Troubleshooting

Async and Streaming

Async methods

Parallel generation

Streaming

Async and context

Getting Started

Tutorials

Concepts

How-To

Examples

Integrations

Evaluation and Observability

Advanced

Community

Reference

Troubleshooting

Documentation Index

​Async methods

​Parallel generation

​Streaming

​Async and context

Async methods

Parallel generation

Streaming

Async and context