> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# m serve

> Run a Mellea program as an OpenAI-compatible chat endpoint with m serve.

`m serve` runs any Mellea program as an OpenAI-compatible chat endpoint. This lets
any LLM client — LangChain, the OpenAI SDK, `curl` — call your Mellea program as if
it were a model.

**Prerequisites:** `pip install "mellea[server]"`.

## The serve() function

Your program must define a `serve()` function with this signature:

```python theme={null}
from cli.serve.models import ChatMessage
from mellea.core import ModelOutputThunk, SamplingResult

def serve(
    input: list[ChatMessage],
    requirements: list[str] | None = None,
    model_options: dict | None = None,
) -> ModelOutputThunk | SamplingResult:
    """Your Mellea program logic here."""
    ...
```

`m serve` loads your file, finds `serve()`, and routes incoming requests to it.
`ChatMessage` has `role` and `content` fields matching the OpenAI chat format.

## Example serve program

```python theme={null}
import mellea
from cli.serve.models import ChatMessage
from mellea.core import ModelOutputThunk, Requirement, SamplingResult
from mellea.stdlib.context import ChatContext
from mellea.stdlib.requirements import simple_validate
from mellea.stdlib.sampling import RejectionSamplingStrategy

session = mellea.start_session(ctx=ChatContext())

def serve(
    input: list[ChatMessage],
    requirements: list[str] | None = None,
    model_options: dict | None = None,
) -> ModelOutputThunk | SamplingResult:
    """Takes a prompt as input and runs it through a Mellea program."""
    message = input[-1].content
    reqs = [
        Requirement(
            "Keep this under 50 words",
            validation_fn=simple_validate(lambda x: len(x.split()) < 50),
        ),
        *(requirements or []),
    ]
    return session.instruct(
        description=message,
        requirements=reqs,
        strategy=RejectionSamplingStrategy(loop_budget=3),
        model_options=model_options,
    )
```

The session is initialised at module level so it is reused across requests. This
preserves the `ChatContext` conversation history across turns.

## Starting m serve

```bash theme={null}
m serve path/to/your_program.py
```

The server starts on port 8000 by default and exposes:

* `POST /v1/chat/completions` — OpenAI-compatible chat completions endpoint
* `GET /health` — health check

To see all options:

```bash theme={null}
m serve --help
```

## Calling the served endpoint

Any OpenAI-compatible client works. Using `curl`:

```bash theme={null}
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Summarize this in one sentence."}]}'
```

Using the OpenAI Python SDK:

```python theme={null}
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
    model="mellea",
    messages=[{"role": "user", "content": "Summarize this in one sentence."}],
)
print(response.choices[0].message.content)
```

**Full example:** [`docs/examples/m_serve/m_serve_example_simple.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/m_serve/m_serve_example_simple.py)

***

**See also:** [Context and Sessions](../concepts/context-and-sessions) |
[Backends and Configuration](../how-to/backends-and-configuration) |
[CLI Reference](../reference/cli)
