Prerequisites: Telemetry introduces the environment variables and telemetry architecture. This page covers metrics collection in detail. Mellea automatically tracks token consumption across all backends using OpenTelemetry metrics counters. Token metrics follow the Gen-AI Semantic Conventions for standardized observability. The metrics API also lets you create your own counters, histograms, and up-down counters for application-level instrumentation.Documentation Index
Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Note: Metrics are an optional feature. All instrument calls are no-ops
when metrics are disabled or the [telemetry] extra is not installed.
Enable metrics
Token usage metrics
Mellea records token consumption automatically after each LLM call completes. No code changes are required. TheTokenMetricsPlugin auto-registers when
MELLEA_METRICS_ENABLED=true and records metrics via the plugin hook system.
Built-in metrics
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.tokens.input | Counter | tokens | Total input/prompt tokens processed |
mellea.llm.tokens.output | Counter | tokens | Total output/completion tokens generated |
Metric attributes
All token metrics include these attributes following Gen-AI semantic conventions:| Attribute | Description | Example Values |
|---|---|---|
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-4, llama3.2:7b, granite-3.1-8b-instruct |
Backend support
| Backend | Streaming | Non-Streaming | Source |
|---|---|---|---|
| OpenAI | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| Ollama | Yes | Yes | prompt_eval_count and eval_count |
| WatsonX | No | Yes | input_token_count and generated_token_count (streaming API limitation) |
| LiteLLM | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| HuggingFace | Yes | Yes | Calculated from input_ids and output sequences |
Note: Token usage metrics are only tracked forgenerate_from_contextrequests.generate_from_rawcalls do not record token metrics.
When metrics are recorded
Token metrics are recorded after the full response is received, not incrementally during streaming:- Non-streaming: Metrics recorded immediately after
await mot.avalue()completes. - Streaming: Metrics recorded after the stream is fully consumed (all chunks received).
Metrics export configuration
Mellea supports multiple metrics exporters that can be used independently or simultaneously.
Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured,
Mellea logs a warning. Metrics are collected but not exported.
Console exporter (debugging)
Print metrics to console for local debugging without setting up an observability backend:OTLP exporter (production)
Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):Prometheus exporter
Register metrics with theprometheus_client default registry for
Prometheus scraping:
prometheus_client default registry via PrometheusMetricReader. Your
application is responsible for exposing the registry. Common approaches:
Standalone HTTP server (simplest):
http://localhost:9090 and query metrics like
mellea_llm_tokens_input.
Multiple exporters simultaneously
You can enable multiple exporters at once:prometheus_client registry for Prometheus scraping.
Typical combinations:
- Development: Console + Prometheus for local testing
- Production: OTLP + Prometheus for comprehensive monitoring
- Debugging: Console only for quick verification
Custom metrics
The metrics API exposescreate_counter, create_histogram, and
create_up_down_counter for instrumenting your own application code. These
return no-ops when metrics are disabled, so you can call them unconditionally.
Programmatic access
Check if metrics are enabled:ModelOutputThunk:
usage field is a dictionary with three keys: prompt_tokens,
completion_tokens, and total_tokens. All backends populate this field
consistently.
Performance
- Zero overhead when disabled: When
MELLEA_METRICS_ENABLED=false(default), theTokenMetricsPluginis not registered and all instrument calls are no-ops. - Minimal overhead when enabled: Counter increments are extremely fast (~nanoseconds per operation).
- Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
- Non-blocking: Metric recording never blocks LLM calls.
- Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.
Troubleshooting
Metrics not appearing:- Verify
MELLEA_METRICS_ENABLED=trueis set. - Check that at least one exporter is configured (Console, OTLP, or Prometheus).
- For OTLP: Verify
MELLEA_METRICS_OTLP=trueand the endpoint is reachable. - For Prometheus: Verify
MELLEA_METRICS_PROMETHEUS=trueand your application exposes the registry (curl http://localhost:PORT/metrics). - Enable console output (
MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.
- Verify the OTLP collector is running:
docker ps | grep otel - Check the endpoint URL is correct (default:
http://localhost:4317). - Verify network connectivity:
curl http://localhost:4317 - Check collector logs for errors.
- Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
- Reduce the export interval for testing:
export OTEL_METRIC_EXPORT_INTERVAL=10000(10 seconds). - For Prometheus: Metrics update on scrape, not continuously.
- Verify LLM calls are actually being made and completing successfully.
- Console:
export MELLEA_METRICS_CONSOLE=true - OTLP:
export MELLEA_METRICS_OTLP=true+ endpoint - Prometheus:
export MELLEA_METRICS_PROMETHEUS=true
Full example: docs/examples/telemetry/metrics_example.py
See also: