Documentation Index
Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites: Quick Start complete,
pip install mellea, Ollama running locally.
Async methods
Every sync method on MelleaSession has an a-prefixed async counterpart with the
same signature and return type:
| Sync | Async |
|---|
instruct() | ainstruct() |
chat() | achat() |
act() | aact() |
validate() | avalidate() |
query() | aquery() |
transform() | atransform() |
import asyncio
import mellea
async def main():
m = mellea.start_session()
result = await m.ainstruct("Write a haiku about concurrency.")
print(str(result))
# Output will vary — LLM responses depend on model and temperature.
asyncio.run(main())
Parallel generation
ainstruct() returns a ModelOutputThunk immediately — generation starts in the
background but the value is not resolved until you call avalue(). This lets you
fire multiple generations and resolve them all at once:
import asyncio
import mellea
async def main():
m = mellea.start_session()
# Fire off all three — generation starts for each immediately
thunk_a = await m.ainstruct("Write a poem about mountains.")
thunk_b = await m.ainstruct("Write a poem about rivers.")
thunk_c = await m.ainstruct("Write a poem about forests.")
# None are resolved yet
print(thunk_a.is_computed()) # False
# Resolve all in parallel
await asyncio.gather(
thunk_a.avalue(),
thunk_b.avalue(),
thunk_c.avalue(),
)
print(thunk_a.value)
print(thunk_b.value)
print(thunk_c.value)
# Output will vary — LLM responses depend on model and temperature.
asyncio.run(main())
For a list of thunks, wait_for_all_mots is a convenience wrapper:
import asyncio
import mellea
from mellea.helpers.async_helpers import wait_for_all_mots
async def main():
m = mellea.start_session()
thunks = []
for topic in ["mountains", "rivers", "forests"]:
thunks.append(await m.ainstruct(f"Write a short poem about {topic}."))
await wait_for_all_mots(thunks)
for t in thunks:
print(t.value)
# Output will vary — LLM responses depend on model and temperature.
asyncio.run(main())
Note: All thunks passed to wait_for_all_mots must belong to the same event
loop, which is always the case when using MelleaSession.
Streaming
Enable streaming by passing ModelOption.STREAM: True in model_options. Consume
incremental output chunks with mot.astream():
import asyncio
import mellea
from mellea.backends import ModelOption
async def main():
m = mellea.start_session()
mot = await m.ainstruct(
"Write a short story about a robot learning to cook.",
model_options={ModelOption.STREAM: True},
)
# Consume chunks as they arrive
while not mot.is_computed():
chunk = await mot.astream()
print(chunk, end="", flush=True)
print() # newline after streaming completes
asyncio.run(main())
# Output will vary — LLM responses depend on model and temperature.
How astream() behaves:
- Each call returns only the new content since the previous call.
- When the thunk is fully computed (
is_computed() returns True), the final
astream() call returns the complete value.
- If the thunk is already computed,
astream() returns the full value immediately.
Warning: Do not call astream() from multiple coroutines simultaneously on
the same thunk. Each thunk should have a single reader.
Async and context
Use SimpleContext (the default) with concurrent async requests. Using ChatContext
with concurrent requests can cause stale context issues — Mellea logs a warning
when this is detected:
WARNING: Not using a SimpleContext with asynchronous requests could cause
unexpected results due to stale contexts. Ensure you await between requests.
If you need ChatContext with async, await each call before starting the next:
import asyncio
import mellea
from mellea.stdlib.context import ChatContext
async def sequential_chat():
m = mellea.start_session(ctx=ChatContext())
r1 = await m.achat("Hello.")
r2 = await m.achat("Tell me more.") # safe — r1 is fully resolved
print(str(r2))
# Output will vary — LLM responses depend on model and temperature.
asyncio.run(sequential_chat())
For parallel generation, use SimpleContext.
See also: Tutorial 02: Streaming and Async | act() and aact()