Skip to main content
Prerequisites: Telemetry introduces the environment variables and telemetry architecture. This page covers metrics collection in detail. Mellea automatically tracks token consumption across all backends using OpenTelemetry metrics counters. Token metrics follow the Gen-AI Semantic Conventions for standardized observability. The metrics API also lets you create your own counters, histograms, and up-down counters for application-level instrumentation.
Note: Metrics are an optional feature. All instrument calls are no-ops when metrics are disabled or the [telemetry] extra is not installed.

Enable metrics

export MELLEA_METRICS_ENABLED=true
You also need at least one exporter configured — see Metrics export configuration below.

Token usage metrics

Mellea records token consumption automatically after each LLM call completes. No code changes are required. The TokenMetricsPlugin auto-registers when MELLEA_METRICS_ENABLED=true and records metrics via the plugin hook system.

Built-in metrics

Metric NameTypeUnitDescription
mellea.llm.tokens.inputCountertokensTotal input/prompt tokens processed
mellea.llm.tokens.outputCountertokensTotal output/completion tokens generated

Metric attributes

All token metrics include these attributes following Gen-AI semantic conventions:
AttributeDescriptionExample Values
gen_ai.provider.nameBackend provider nameopenai, ollama, watsonx, litellm, huggingface
gen_ai.request.modelModel identifiergpt-4, llama3.2:7b, granite-3.1-8b-instruct

Backend support

BackendStreamingNon-StreamingSource
OpenAIYesYesusage.prompt_tokens and usage.completion_tokens
OllamaYesYesprompt_eval_count and eval_count
WatsonXNoYesinput_token_count and generated_token_count (streaming API limitation)
LiteLLMYesYesusage.prompt_tokens and usage.completion_tokens
HuggingFaceYesYesCalculated from input_ids and output sequences
Note: Token usage metrics are only tracked for generate_from_context requests. generate_from_raw calls do not record token metrics.

When metrics are recorded

Token metrics are recorded after the full response is received, not incrementally during streaming:
  • Non-streaming: Metrics recorded immediately after await mot.avalue() completes.
  • Streaming: Metrics recorded after the stream is fully consumed (all chunks received).
This ensures accurate token counts from the backend’s usage metadata, which is only available after the complete response.
mot, _ = await backend.generate_from_context(msg, ctx)

# Metrics NOT recorded yet (stream still in progress)
await mot.astream()

# Metrics recorded here (after stream completion)
await mot.avalue()

Metrics export configuration

Mellea supports multiple metrics exporters that can be used independently or simultaneously.
Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured, Mellea logs a warning. Metrics are collected but not exported.

Console exporter (debugging)

Print metrics to console for local debugging without setting up an observability backend:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
python your_script.py
Metrics are printed as JSON at the configured export interval (default: 60 seconds).

OTLP exporter (production)

Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional: metrics-specific endpoint (overrides general endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318

# Optional: set service name
export OTEL_SERVICE_NAME=my-mellea-app

# Optional: adjust export interval (milliseconds, default: 60000)
export OTEL_METRIC_EXPORT_INTERVAL=30000
OTLP collector setup example:
cat > otel-collector-config.yaml <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus, debug]
EOF

docker run -p 4317:4317 -p 8889:8889 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector:latest

Prometheus exporter

Register metrics with the prometheus_client default registry for Prometheus scraping:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_PROMETHEUS=true
When enabled, Mellea registers its OpenTelemetry metrics with the prometheus_client default registry via PrometheusMetricReader. Your application is responsible for exposing the registry. Common approaches: Standalone HTTP server (simplest):
from prometheus_client import start_http_server

start_http_server(9464)
FastAPI middleware:
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/metrics")
def metrics():
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
Flask route:
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from flask import Flask, Response

app = Flask(__name__)

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)
Verify with:
curl http://localhost:9464/metrics
Prometheus server configuration:
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mellea'
    static_configs:
      - targets: ['localhost:9464']
docker run -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus
Access Prometheus UI at http://localhost:9090 and query metrics like mellea_llm_tokens_input.

Multiple exporters simultaneously

You can enable multiple exporters at once:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export MELLEA_METRICS_PROMETHEUS=true
This configuration prints metrics to console for immediate feedback, exports to an OTLP collector for centralized observability, and registers with the prometheus_client registry for Prometheus scraping. Typical combinations:
  • Development: Console + Prometheus for local testing
  • Production: OTLP + Prometheus for comprehensive monitoring
  • Debugging: Console only for quick verification

Custom metrics

The metrics API exposes create_counter, create_histogram, and create_up_down_counter for instrumenting your own application code. These return no-ops when metrics are disabled, so you can call them unconditionally.
from mellea.telemetry import create_counter, create_histogram, create_up_down_counter

# Monotonically increasing values
requests = create_counter("myapp.requests", unit="1", description="Total requests")
requests.add(1, {"backend": "ollama", "model": "granite4:micro"})

# Value distributions
latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
latency.record(120.5, {"backend": "ollama"})

# Values that increase or decrease
active = create_up_down_counter(
    "myapp.sessions.active", unit="1", description="Active sessions"
)
active.add(1)   # session started
active.add(-1)  # session ended

Programmatic access

Check if metrics are enabled:
from mellea.telemetry import is_metrics_enabled

if is_metrics_enabled():
    print("Token metrics are being collected")
Access token usage data from a ModelOutputThunk:
from mellea import start_session

with start_session() as m:
    result = m.instruct("Write a haiku about programming")

    if result.usage:
        print(f"Prompt tokens: {result.usage['prompt_tokens']}")
        print(f"Completion tokens: {result.usage['completion_tokens']}")
        print(f"Total tokens: {result.usage['total_tokens']}")
The usage field is a dictionary with three keys: prompt_tokens, completion_tokens, and total_tokens. All backends populate this field consistently.

Performance

  • Zero overhead when disabled: When MELLEA_METRICS_ENABLED=false (default), the TokenMetricsPlugin is not registered and all instrument calls are no-ops.
  • Minimal overhead when enabled: Counter increments are extremely fast (~nanoseconds per operation).
  • Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
  • Non-blocking: Metric recording never blocks LLM calls.
  • Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.

Troubleshooting

Metrics not appearing:
  1. Verify MELLEA_METRICS_ENABLED=true is set.
  2. Check that at least one exporter is configured (Console, OTLP, or Prometheus).
  3. For OTLP: Verify MELLEA_METRICS_OTLP=true and the endpoint is reachable.
  4. For Prometheus: Verify MELLEA_METRICS_PROMETHEUS=true and your application exposes the registry (curl http://localhost:PORT/metrics).
  5. Enable console output (MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.
Missing OpenTelemetry dependency:
ImportError: No module named 'opentelemetry'
Install telemetry dependencies:
pip install "mellea[telemetry]"
OTLP connection refused:
Failed to export metrics via OTLP
  1. Verify the OTLP collector is running: docker ps | grep otel
  2. Check the endpoint URL is correct (default: http://localhost:4317).
  3. Verify network connectivity: curl http://localhost:4317
  4. Check collector logs for errors.
Metrics not updating:
  1. Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
  2. Reduce the export interval for testing: export OTEL_METRIC_EXPORT_INTERVAL=10000 (10 seconds).
  3. For Prometheus: Metrics update on scrape, not continuously.
  4. Verify LLM calls are actually being made and completing successfully.
No exporter configured warning:
WARNING: Metrics are enabled but no exporters are configured
Enable at least one exporter:
  • Console: export MELLEA_METRICS_CONSOLE=true
  • OTLP: export MELLEA_METRICS_OTLP=true + endpoint
  • Prometheus: export MELLEA_METRICS_PROMETHEUS=true
Full example: docs/examples/telemetry/metrics_example.py

See also:
  • Telemetry — overview of all telemetry features and configuration.
  • Tracing — distributed traces with Gen-AI semantic conventions.
  • Logging — console logging and OTLP log export.