Metrics

Prerequisites: Telemetry introduces the environment variables and telemetry architecture. This page covers metrics collection in detail. Mellea automatically tracks token consumption across all backends using OpenTelemetry metrics counters. Token metrics follow the Gen-AI Semantic Conventions for standardized observability. The metrics API also lets you create your own counters, histograms, and up-down counters for application-level instrumentation.

Note: Metrics are an optional feature. All instrument calls are no-ops when metrics are disabled or the [telemetry] extra is not installed.

Enable metrics

export MELLEA_METRICS_ENABLED=true

You also need at least one exporter configured — see Metrics export configuration below.

Token usage metrics

Mellea records token consumption automatically after each LLM call completes. No code changes are required. The TokenMetricsPlugin auto-registers when MELLEA_METRICS_ENABLED=true and records metrics via the plugin hook system.

Built-in metrics

Metric Name	Type	Unit	Description
`mellea.llm.tokens.input`	Counter	`tokens`	Total input/prompt tokens processed
`mellea.llm.tokens.output`	Counter	`tokens`	Total output/completion tokens generated

Metric attributes

All token metrics include these attributes following Gen-AI semantic conventions:

Attribute	Description	Example Values
`gen_ai.provider.name`	Backend provider name	`openai`, `ollama`, `watsonx`, `litellm`, `huggingface`
`gen_ai.request.model`	Model identifier	`gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct`

Backend support

Backend	Streaming	Non-Streaming	Source
OpenAI	Yes	Yes	`usage.prompt_tokens` and `usage.completion_tokens`
Ollama	Yes	Yes	`prompt_eval_count` and `eval_count`
WatsonX	No	Yes	`input_token_count` and `generated_token_count` (streaming API limitation)
LiteLLM	Yes	Yes	`usage.prompt_tokens` and `usage.completion_tokens`
HuggingFace	Yes	Yes	Calculated from input_ids and output sequences

Note: Token usage metrics are only tracked for generate_from_context requests. generate_from_raw calls do not record token metrics.

When metrics are recorded

Token metrics are recorded after the full response is received, not incrementally during streaming:

Non-streaming: Metrics recorded immediately after await mot.avalue() completes.
Streaming: Metrics recorded after the stream is fully consumed (all chunks received).

This ensures accurate token counts from the backend’s usage metadata, which is only available after the complete response.

mot, _ = await backend.generate_from_context(msg, ctx)

# Metrics NOT recorded yet (stream still in progress)
await mot.astream()

# Metrics recorded here (after stream completion)
await mot.avalue()

Metrics export configuration

Mellea supports multiple metrics exporters that can be used independently or simultaneously.

Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured, Mellea logs a warning. Metrics are collected but not exported.

Console exporter (debugging)

Print metrics to console for local debugging without setting up an observability backend:

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
python your_script.py

Metrics are printed as JSON at the configured export interval (default: 60 seconds).

OTLP exporter (production)

Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional: metrics-specific endpoint (overrides general endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318

# Optional: set service name
export OTEL_SERVICE_NAME=my-mellea-app

# Optional: adjust export interval (milliseconds, default: 60000)
export OTEL_METRIC_EXPORT_INTERVAL=30000

OTLP collector setup example:

cat > otel-collector-config.yaml <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus, debug]
EOF

docker run -p 4317:4317 -p 8889:8889 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector:latest

Prometheus exporter

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_PROMETHEUS=true

When enabled, Mellea registers its OpenTelemetry metrics with the prometheus_client default registry via PrometheusMetricReader. Your application is responsible for exposing the registry. Common approaches: Standalone HTTP server (simplest):

from prometheus_client import start_http_server

start_http_server(9464)

FastAPI middleware:

from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/metrics")
def metrics():
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)

Flask route:

from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from flask import Flask, Response

app = Flask(__name__)

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)

Verify with:

curl http://localhost:9464/metrics

Prometheus server configuration:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mellea'
    static_configs:
      - targets: ['localhost:9464']

docker run -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Access Prometheus UI at http://localhost:9090 and query metrics like mellea_llm_tokens_input.

Multiple exporters simultaneously

You can enable multiple exporters at once:

export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export MELLEA_METRICS_PROMETHEUS=true

This configuration prints metrics to console for immediate feedback, exports to an OTLP collector for centralized observability, and registers with the prometheus_client registry for Prometheus scraping. Typical combinations:

Development: Console + Prometheus for local testing
Production: OTLP + Prometheus for comprehensive monitoring
Debugging: Console only for quick verification

Custom metrics

The metrics API exposes create_counter, create_histogram, and create_up_down_counter for instrumenting your own application code. These return no-ops when metrics are disabled, so you can call them unconditionally.

from mellea.telemetry import create_counter, create_histogram, create_up_down_counter

# Monotonically increasing values
requests = create_counter("myapp.requests", unit="1", description="Total requests")
requests.add(1, {"backend": "ollama", "model": "granite4:micro"})

# Value distributions
latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
latency.record(120.5, {"backend": "ollama"})

# Values that increase or decrease
active = create_up_down_counter(
    "myapp.sessions.active", unit="1", description="Active sessions"
)
active.add(1)   # session started
active.add(-1)  # session ended

Programmatic access

Check if metrics are enabled:

from mellea.telemetry import is_metrics_enabled

if is_metrics_enabled():
    print("Token metrics are being collected")

Access token usage data from a ModelOutputThunk:

from mellea import start_session

with start_session() as m:
    result = m.instruct("Write a haiku about programming")

    if result.usage:
        print(f"Prompt tokens: {result.usage['prompt_tokens']}")
        print(f"Completion tokens: {result.usage['completion_tokens']}")
        print(f"Total tokens: {result.usage['total_tokens']}")

The usage field is a dictionary with three keys: prompt_tokens, completion_tokens, and total_tokens. All backends populate this field consistently.

Performance

Zero overhead when disabled: When MELLEA_METRICS_ENABLED=false (default), the TokenMetricsPlugin is not registered and all instrument calls are no-ops.
Minimal overhead when enabled: Counter increments are extremely fast (~nanoseconds per operation).
Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
Non-blocking: Metric recording never blocks LLM calls.
Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.

Troubleshooting

Metrics not appearing:

Verify MELLEA_METRICS_ENABLED=true is set.
Check that at least one exporter is configured (Console, OTLP, or Prometheus).
For OTLP: Verify MELLEA_METRICS_OTLP=true and the endpoint is reachable.
For Prometheus: Verify MELLEA_METRICS_PROMETHEUS=true and your application exposes the registry (curl http://localhost:PORT/metrics).
Enable console output (MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.

Missing OpenTelemetry dependency:

ImportError: No module named 'opentelemetry'

Install telemetry dependencies:

pip install "mellea[telemetry]"

OTLP connection refused:

Failed to export metrics via OTLP

Verify the OTLP collector is running: docker ps | grep otel
Check the endpoint URL is correct (default: http://localhost:4317).
Verify network connectivity: curl http://localhost:4317
Check collector logs for errors.

Metrics not updating:

Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
Reduce the export interval for testing: export OTEL_METRIC_EXPORT_INTERVAL=10000 (10 seconds).
For Prometheus: Metrics update on scrape, not continuously.
Verify LLM calls are actually being made and completing successfully.

No exporter configured warning:

WARNING: Metrics are enabled but no exporters are configured

Enable at least one exporter:

Console: export MELLEA_METRICS_CONSOLE=true
OTLP: export MELLEA_METRICS_OTLP=true + endpoint
Prometheus: export MELLEA_METRICS_PROMETHEUS=true

Full example: docs/examples/telemetry/metrics_example.py

See also:

Telemetry — overview of all telemetry features and configuration.
Tracing — distributed traces with Gen-AI semantic conventions.
Logging — console logging and OTLP log export.

Getting Started

Tutorials

Concepts

How-To

Examples

Integrations

Evaluation and Observability

Advanced

Community

Reference

Troubleshooting

Enable metrics

Token usage metrics

Built-in metrics

Metric attributes

Backend support

When metrics are recorded

Metrics export configuration

Console exporter (debugging)

OTLP exporter (production)

Prometheus exporter

Multiple exporters simultaneously

Custom metrics

Programmatic access

Performance

Troubleshooting

Getting Started

Tutorials

Concepts

How-To

Examples

Integrations

Evaluation and Observability

Advanced

Community

Reference

Troubleshooting

Documentation Index

​Enable metrics

​Token usage metrics

​Built-in metrics

​Metric attributes

​Backend support

​When metrics are recorded

​Metrics export configuration

​Console exporter (debugging)

​OTLP exporter (production)

​Prometheus exporter

​Multiple exporters simultaneously

​Custom metrics

​Programmatic access

​Performance

​Troubleshooting

Enable metrics

Token usage metrics

Built-in metrics

Metric attributes

Backend support

When metrics are recorded

Metrics export configuration

Console exporter (debugging)

OTLP exporter (production)

Prometheus exporter

Multiple exporters simultaneously

Custom metrics

Programmatic access

Performance

Troubleshooting