Note: Metrics are an optional feature. All instrument calls are no-ops
when metrics are disabled or the [telemetry] extra is not installed.
Enable metrics
Token usage metrics
Mellea records token consumption automatically after each LLM call completes. No code changes are required. TheTokenMetricsPlugin auto-registers when
MELLEA_METRICS_ENABLED=true and records metrics via the plugin hook system.
Built-in metrics
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.tokens.input | Counter | tokens | Total input/prompt tokens processed |
mellea.llm.tokens.output | Counter | tokens | Total output/completion tokens generated |
Metric attributes
All token metrics include these attributes following Gen-AI semantic conventions:| Attribute | Description | Example Values |
|---|---|---|
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-4, llama3.2:7b, granite-3.1-8b-instruct |
Backend support
| Backend | Streaming | Non-Streaming | Source |
|---|---|---|---|
| OpenAI | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| Ollama | Yes | Yes | prompt_eval_count and eval_count |
| WatsonX | No | Yes | input_token_count and generated_token_count (streaming API limitation) |
| LiteLLM | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| HuggingFace | Yes | Yes | Calculated from input_ids and output sequences |
Note: Token usage metrics are only tracked forgenerate_from_contextrequests.generate_from_rawcalls do not record token metrics.
When metrics are recorded
Token metrics are recorded after the full response is received, not incrementally during streaming:- Non-streaming: Metrics recorded immediately after
await mot.avalue()completes. - Streaming: Metrics recorded after the stream is fully consumed (all chunks received).
Metrics export configuration
Mellea supports multiple metrics exporters that can be used independently or simultaneously.
Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured,
Mellea logs a warning. Metrics are collected but not exported.
Console exporter (debugging)
Print metrics to console for local debugging without setting up an observability backend:OTLP exporter (production)
Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):Prometheus exporter
Register metrics with theprometheus_client default registry for
Prometheus scraping:
prometheus_client default registry via PrometheusMetricReader. Your
application is responsible for exposing the registry. Common approaches:
Standalone HTTP server (simplest):
http://localhost:9090 and query metrics like
mellea_llm_tokens_input.
Multiple exporters simultaneously
You can enable multiple exporters at once:prometheus_client registry for Prometheus scraping.
Typical combinations:
- Development: Console + Prometheus for local testing
- Production: OTLP + Prometheus for comprehensive monitoring
- Debugging: Console only for quick verification
Custom metrics
The metrics API exposescreate_counter, create_histogram, and
create_up_down_counter for instrumenting your own application code. These
return no-ops when metrics are disabled, so you can call them unconditionally.
Programmatic access
Check if metrics are enabled:ModelOutputThunk:
usage field is a dictionary with three keys: prompt_tokens,
completion_tokens, and total_tokens. All backends populate this field
consistently.
Performance
- Zero overhead when disabled: When
MELLEA_METRICS_ENABLED=false(default), theTokenMetricsPluginis not registered and all instrument calls are no-ops. - Minimal overhead when enabled: Counter increments are extremely fast (~nanoseconds per operation).
- Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
- Non-blocking: Metric recording never blocks LLM calls.
- Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.
Troubleshooting
Metrics not appearing:- Verify
MELLEA_METRICS_ENABLED=trueis set. - Check that at least one exporter is configured (Console, OTLP, or Prometheus).
- For OTLP: Verify
MELLEA_METRICS_OTLP=trueand the endpoint is reachable. - For Prometheus: Verify
MELLEA_METRICS_PROMETHEUS=trueand your application exposes the registry (curl http://localhost:PORT/metrics). - Enable console output (
MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.
- Verify the OTLP collector is running:
docker ps | grep otel - Check the endpoint URL is correct (default:
http://localhost:4317). - Verify network connectivity:
curl http://localhost:4317 - Check collector logs for errors.
- Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
- Reduce the export interval for testing:
export OTEL_METRIC_EXPORT_INTERVAL=10000(10 seconds). - For Prometheus: Metrics update on scrape, not continuously.
- Verify LLM calls are actually being made and completing successfully.
- Console:
export MELLEA_METRICS_CONSOLE=true - OTLP:
export MELLEA_METRICS_OTLP=true+ endpoint - Prometheus:
export MELLEA_METRICS_PROMETHEUS=true
Full example: docs/examples/telemetry/metrics_example.py
See also: