Metrics - Mellea

Prerequisites: Telemetry introduces the environment variables and telemetry architecture. This page covers metrics collection in detail. Mellea automatically records LLM metrics across all backends using OpenTelemetry. Metrics follow the Gen-AI Semantic Conventions for standardized observability. The metrics API also lets you create your own counters, histograms, and up-down counters for application-level instrumentation.

Note: Metrics are an optional feature. All instrument calls are no-ops when metrics are disabled or the [telemetry] extra is not installed.

Enable metrics

export MELLEA_METRICS_ENABLED=true

You also need at least one exporter configured — see Metrics export configuration below.

Token usage metrics

Mellea records token consumption automatically after each LLM call completes. No code changes are required.

Token instruments

Metric Name	Type	Unit	Description
`mellea.llm.tokens.input`	Counter	`tokens`	Total input/prompt tokens processed
`mellea.llm.tokens.output`	Counter	`tokens`	Total output/completion tokens generated

Token attributes

All token metrics include these attributes following Gen-AI semantic conventions:

Attribute	Description	Example Values
`gen_ai.provider.name`	Backend provider name	`openai`, `ollama`, `watsonx`, `litellm`, `huggingface`
`gen_ai.request.model`	Model identifier	`gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct`

Backend support

Backend	Streaming	Non-Streaming	Source
OpenAI	Yes	Yes	`usage.prompt_tokens` and `usage.completion_tokens`
Ollama	Yes	Yes	`prompt_eval_count` and `eval_count`
WatsonX	No	Yes	`input_token_count` and `generated_token_count` (streaming API limitation)
LiteLLM	Yes	Yes	`usage.prompt_tokens` and `usage.completion_tokens`
HuggingFace	Yes	Yes	Calculated from input_ids and output sequences

Note: Token usage metrics are only tracked for generate_from_context requests. generate_from_raw calls do not record token metrics.

Token recording timing

Token metrics are recorded after the full response is received, not incrementally during streaming:

Non-streaming: Metrics recorded immediately after await mot.avalue() completes.
Streaming: Metrics recorded after the stream is fully consumed (all chunks received).

This ensures accurate token counts from the backend’s usage metadata, which is only available after the complete response.

mot, _ = await backend.generate_from_context(msg, ctx)

# Metrics NOT recorded yet (stream still in progress)
await mot.astream()

# Metrics recorded here (after stream completion)
await mot.avalue()

Latency histograms

Mellea tracks request duration and time-to-first-token (TTFB) automatically after each LLM call. No code changes are required.

Latency instruments

Metric Name	Type	Unit	Description
`mellea.llm.request.duration`	Histogram	`s`	Total request duration, from call to full response
`mellea.llm.ttfb`	Histogram	`s`	Time to first token (streaming requests only)

Latency attributes

Attribute	Description	Example Values
`gen_ai.provider.name`	Backend provider name	`openai`, `ollama`, `watsonx`, `litellm`, `huggingface`
`gen_ai.request.model`	Model identifier	`gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct`
`streaming`	Whether streaming mode was used (duration only)	`True`, `False`

Histogram buckets

Custom bucket boundaries are configured for LLM-sized latencies:

mellea.llm.request.duration: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60, 120 seconds
mellea.llm.ttfb: 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10 seconds

Latency recording timing

mellea.llm.request.duration: Recorded for every generate_from_context call, both streaming and non-streaming.
mellea.llm.ttfb: Recorded only for streaming requests, measuring elapsed time from the generate_from_context call until the first chunk arrives.

Access latency data directly from a ModelOutputThunk:

from mellea import start_session
from mellea.backends import ModelOption

with start_session() as m:
    result = m.instruct(
        "Explain quantum entanglement briefly",
        model_options={ModelOption.STREAM: True},
    )
    if result.generation.streaming and result.generation.ttfb_ms is not None:
        print(f"Time to first token: {result.generation.ttfb_ms:.1f} ms")

Error metrics

Mellea records LLM errors automatically after each failed backend call. No code changes are required. Errors are classified into semantic categories for consistent filtering across providers.

Error counter

Metric Name	Type	Unit	Description
`mellea.llm.errors`	Counter	`{error}`	Total LLM errors categorized by type

Error attributes

All error metrics include these attributes:

Attribute	Description	Example Values
`error_type`	Semantic error category (mellea-specific)	`rate_limit`, `timeout`, `auth`, `content_policy`, `invalid_request`, `transport_error`, `server_error`, `unknown`
`gen_ai.provider.name`	Backend provider name	`openai`, `ollama`, `watsonx`, `litellm`, `huggingface`
`gen_ai.request.model`	Model identifier	`gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct`
`error.type`	Python exception class name (standard OTel)	`RateLimitError`, `TimeoutError`, `AuthenticationError`

Error type categories

The error_type attribute maps exceptions to human-friendly semantic labels:

Category	Description	Matched exceptions
`rate_limit`	Request throttled by provider	`openai.RateLimitError`, class names containing `ratelimit`
`timeout`	Request or connection timed out	`TimeoutError`, `openai.APITimeoutError`, class names containing `timeout`
`auth`	Authentication or authorization failure	`openai.AuthenticationError`, `openai.PermissionDeniedError`, class names containing `auth`
`content_policy`	Request rejected by content moderation	`openai.BadRequestError` with `code="content_policy_violation"`, class names containing `content_policy`
`invalid_request`	Malformed or unsupported request	`openai.BadRequestError` (non-content-policy)
`transport_error`	Network or connection failure	`ConnectionError`, `openai.APIConnectionError`, class names containing `connection`/`transport`
`server_error`	Provider-side internal error	`openai.InternalServerError`, class names containing `server`
`unknown`	Unrecognized exception type	Any exception not matched above

When errors are recorded

Error metrics are recorded when a backend raises an exception during generation, after the request has been dispatched to the provider. Construction-time errors (e.g. missing API key) are not captured by the error counter.

Cost metrics

Mellea estimates request cost automatically after each LLM call when pricing data is available. No code changes are required.

Cost instrument

Metric Name	Type	Unit	Description
`mellea.llm.cost.usd`	Counter	`USD`	Estimated request cost in US dollars

Cost attributes

Attribute	Description	Example Values
`gen_ai.provider.name`	Backend provider name	`openai`, `ollama`, `watsonx`, `litellm`, `huggingface`
`gen_ai.request.model`	Model identifier	`gpt-5.4`, `claude-sonnet-4-6`

Pricing data

Cost metrics use litellm (mellea[litellm]) as a pricing library. This is independent of the LiteLLM backend — pricing works with any Mellea backend, but cost is only recorded for models that litellm has pricing data for. Local and private model IDs (Ollama, HuggingFace, custom deployments) will log a one-time warning per model and produce no cost metric. Pricing is auto-enabled when litellm is installed. Use MELLEA_PRICING_ENABLED to override:

`MELLEA_PRICING_ENABLED`	litellm installed	Result
`false`	either	Disabled (no warning)
`true`	yes	Enabled
`true`	no	Disabled + warning
unset	yes	Enabled automatically
unset	no	Disabled (no warning)

Custom pricing

Override or add pricing for any model using a JSON file with litellm’s native per-token schema:

export MELLEA_PRICING_FILE=/path/to/my-pricing.json

{
  "my-custom-model": {
    "input_cost_per_token": 0.000001,
    "output_cost_per_token": 0.000002
  },
  "claude-sonnet-4-6": {
    "input_cost_per_token": 0.000003,
    "output_cost_per_token": 0.000015,
    "cache_read_input_token_cost": 0.0000003,
    "cache_creation_input_token_cost": 0.000003750
  }
}

Minimal entries with only cost fields are accepted. Errors loading the file are logged as warnings and litellm’s built-in pricing is used as a fallback.

Operational metrics

Mellea records metrics for its internal sampling, validation, and tool execution loops. These counters give visibility into retry behavior, validation failure rates, and tool call health — independent of the underlying LLM provider.

Sampling counters

Metric Name	Type	Unit	Description
`mellea.sampling.attempts`	Counter	`{attempt}`	Sampling attempts per loop iteration
`mellea.sampling.successes`	Counter	`{sample}`	Sampling loops that produced a passing sample
`mellea.sampling.failures`	Counter	`{failure}`	Sampling loops that exhausted the loop budget without success

All sampling metrics include:

Attribute	Description	Example Values
`strategy`	Sampling strategy class name	`RejectionSamplingStrategy`, `MultiTurnStrategy`, `RepairTemplateStrategy`

Requirement counters

Metric Name	Type	Unit	Description
`mellea.requirement.checks`	Counter	`{check}`	Requirement validation checks performed
`mellea.requirement.failures`	Counter	`{failure}`	Requirement validation checks that failed

Attribute	Description	Example Values
`requirement`	Requirement class name	`LLMaJRequirement`, `PythonExecutionReq`, `ALoraRequirement`, `GuardianCheck`
`reason`	Human-readable failure reason (`mellea.requirement.failures` only)	`"Output did not satisfy constraint"`, `"unknown"`

Tool counter

Metric Name	Type	Unit	Description
`mellea.tool.calls`	Counter	`{call}`	Tool invocations by name and status

Attribute	Description	Example Values
`tool`	Name of the invoked tool	`"search"`, `"calculator"`
`status`	Execution outcome	`success`, `failure`

Metrics export configuration

Mellea supports multiple metrics exporters that can be used independently or simultaneously.

Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured, Mellea logs a warning. Metrics are collected but not exported.

Console exporter (debugging)

Print metrics to console for local debugging without setting up an observability backend:

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
python your_script.py

Metrics are printed as JSON at the configured export interval (default: 60 seconds).

OTLP exporter (production)

Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional: metrics-specific endpoint (overrides general endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318

# Optional: set service name
export OTEL_SERVICE_NAME=my-mellea-app

# Optional: adjust export interval (milliseconds, default: 60000)
export OTEL_METRIC_EXPORT_INTERVAL=30000

OTLP collector setup example:

cat > otel-collector-config.yaml <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus, debug]
EOF

docker run -p 4317:4317 -p 8889:8889 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector:latest

Prometheus exporter

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_PROMETHEUS=true

When enabled, Mellea registers its OpenTelemetry metrics with the prometheus_client default registry via PrometheusMetricReader. Your application is responsible for exposing the registry. Common approaches: Standalone HTTP server (simplest):

from prometheus_client import start_http_server

start_http_server(9464)

FastAPI middleware:

from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/metrics")
def metrics():
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)

Flask route:

from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from flask import Flask, Response

app = Flask(__name__)

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)

Verify with:

curl http://localhost:9464/metrics

Prometheus server configuration:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mellea'
    static_configs:
      - targets: ['localhost:9464']

docker run -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Access Prometheus UI at http://localhost:9090 and query metrics like mellea_llm_tokens_input.

Multiple exporters simultaneously

You can enable multiple exporters at once:

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export MELLEA_METRICS_PROMETHEUS=true

This configuration prints metrics to console for immediate feedback, exports to an OTLP collector for centralized observability, and registers with the prometheus_client registry for Prometheus scraping. Typical combinations:

Development: Console + Prometheus for local testing
Production: OTLP + Prometheus for comprehensive monitoring
Debugging: Console only for quick verification

Custom metrics

The metrics API exposes create_counter, create_histogram, and create_up_down_counter for instrumenting your own application code. These return no-ops when metrics are disabled, so you can call them unconditionally.

from mellea.telemetry import create_counter, create_histogram, create_up_down_counter

# Monotonically increasing values
requests = create_counter("myapp.requests", unit="1", description="Total requests")
requests.add(1, {"backend": "ollama", "model": "granite4.1:3b"})

# Value distributions
latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
latency.record(120.5, {"backend": "ollama"})

# Values that increase or decrease
active = create_up_down_counter(
    "myapp.sessions.active", unit="1", description="Active sessions"
)
active.add(1)   # session started
active.add(-1)  # session ended

Programmatic access

Check if metrics are enabled:

from mellea.telemetry import is_metrics_enabled

if is_metrics_enabled():
    print("Metrics are being collected")

Access token usage and latency data from a ModelOutputThunk:

from mellea import start_session
from mellea.backends import ModelOption

with start_session() as m:
    result = m.instruct("Write a haiku about programming")

    if result.generation.usage:
        print(f"Prompt tokens: {result.generation.usage['prompt_tokens']}")
        print(f"Completion tokens: {result.generation.usage['completion_tokens']}")
        print(f"Total tokens: {result.generation.usage['total_tokens']}")

    # Streaming mode also exposes time-to-first-token
    streamed = m.instruct(
        "Describe the solar system",
        model_options={ModelOption.STREAM: True},
    )
    print(f"Streaming: {streamed.generation.streaming}")
    if streamed.generation.ttfb_ms is not None:
        print(f"Time to first token: {streamed.generation.ttfb_ms:.1f} ms")

The generation attribute is a GenerationMetadata dataclass. Its usage field is a dictionary with three keys: prompt_tokens, completion_tokens, and total_tokens. All backends populate this consistently. streaming and ttfb_ms are set automatically based on whether streaming mode was used.

Performance

Zero overhead when disabled: When MELLEA_METRICS_ENABLED=false (default), no auto-registered metrics plugins are active and all instrument calls are no-ops.
Minimal overhead when enabled: Counter increments and histogram recordings are extremely fast (~nanoseconds per operation).
Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
Non-blocking: Metric recording never blocks LLM calls.
Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.

Troubleshooting

Metrics not appearing:

Verify MELLEA_METRICS_ENABLED=true is set.
Check that at least one exporter is configured (Console, OTLP, or Prometheus).
For OTLP: Verify MELLEA_METRICS_OTLP=true and the endpoint is reachable.
For Prometheus: Verify MELLEA_METRICS_PROMETHEUS=true and your application exposes the registry (curl http://localhost:PORT/metrics).
Enable console output (MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.

Missing OpenTelemetry dependency:

ImportError: No module named 'opentelemetry'

Install telemetry dependencies:

pip install "mellea[telemetry]"

OTLP connection refused:

Failed to export metrics via OTLP

Verify the OTLP collector is running: docker ps | grep otel
Check the endpoint URL is correct (default: http://localhost:4317).
Verify network connectivity: curl http://localhost:4317
Check collector logs for errors.

Metrics not updating:

Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
Reduce the export interval for testing: export OTEL_METRIC_EXPORT_INTERVAL=10000 (10 seconds).
For Prometheus: Metrics update on scrape, not continuously.
Verify LLM calls are actually being made and completing successfully.

No exporter configured warning:

WARNING: Metrics are enabled but no exporters are configured

Enable at least one exporter:

Console: export MELLEA_METRICS_CONSOLE=true
OTLP: export MELLEA_METRICS_OTLP=true + endpoint
Prometheus: export MELLEA_METRICS_PROMETHEUS=true

Full example: docs/examples/telemetry/metrics_example.py

See also:

Telemetry — overview of all telemetry features and configuration.
Tracing — distributed traces with Gen-AI semantic conventions.
Logging — console logging and OTLP log export.

Getting Started

Tutorials

Concepts

How-To

Examples

Integrations

Observability

Advanced

Community

Reference

Troubleshooting

Documentation Index

​Enable metrics

​Token usage metrics

​Token instruments

​Token attributes

​Backend support

​Token recording timing

​Latency histograms

​Latency instruments

​Latency attributes

​Histogram buckets

​Latency recording timing

​Error metrics

​Error counter

​Error attributes

​Error type categories

​When errors are recorded

​Cost metrics

​Cost instrument

​Cost attributes

​Pricing data

​Custom pricing

​Operational metrics

​Sampling counters

​Requirement counters

​Tool counter

​Metrics export configuration

​Console exporter (debugging)

​OTLP exporter (production)

​Prometheus exporter

​Multiple exporters simultaneously

​Custom metrics

​Programmatic access

​Performance

​Troubleshooting

Enable metrics

Token usage metrics

Token instruments

Token attributes

Backend support

Token recording timing

Latency histograms

Latency instruments

Latency attributes

Histogram buckets

Latency recording timing

Error metrics

Error counter

Error attributes

Error type categories

When errors are recorded

Cost metrics

Cost instrument

Cost attributes

Pricing data

Custom pricing

Operational metrics

Sampling counters

Requirement counters

Tool counter

Metrics export configuration

Console exporter (debugging)

OTLP exporter (production)

Prometheus exporter

Multiple exporters simultaneously

Custom metrics

Programmatic access

Performance

Troubleshooting