mellea.telemetry.metrics
OpenTelemetry metrics instrumentation for Mellea.
Provides metrics collection using OpenTelemetry Metrics API with support for:
- Counters: Monotonically increasing values (e.g., request counts, token usage)
- Histograms: Value distributions (e.g., latency, token counts)
- UpDownCounters: Values that can increase or decrease (e.g., active sessions)
Metrics Exporters:
- Console: Print metrics to console for debugging
- OTLP: Export to OpenTelemetry Protocol collectors (Jaeger, Grafana, etc.)
- Prometheus: Register metrics with prometheus_client registry for scraping
Configuration via environment variables:
General:
- MELLEA_METRICS_ENABLED: Enable/disable metrics collection (default: false)
- OTEL_SERVICE_NAME: Service name for metrics (default: mellea)
Console Exporter (debugging):
- MELLEA_METRICS_CONSOLE: Print metrics to console (default: false)
OTLP Exporter (production observability):
- MELLEA_METRICS_OTLP: Enable OTLP metrics exporter (default: false)
- OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint for all signals (optional)
- OTEL_EXPORTER_OTLP_METRICS_ENDPOINT: Metrics-specific endpoint (optional, overrides general)
- OTEL_METRIC_EXPORT_INTERVAL: Export interval in milliseconds (default: 60000)
Prometheus Exporter:
- MELLEA_METRICS_PROMETHEUS: Enable Prometheus metric reader (default: false)
Pricing (for cost counter):
- MELLEA_PRICING_FILE: Path to a JSON file with custom model pricing overrides (optional)
Multiple exporters can be enabled simultaneously.
Example - Console debugging: export MELLEA_METRICS_ENABLED=true export MELLEA_METRICS_CONSOLE=true
Example - OTLP production: export MELLEA_METRICS_ENABLED=true export MELLEA_METRICS_OTLP=true export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Example - Prometheus monitoring: export MELLEA_METRICS_ENABLED=true export MELLEA_METRICS_PROMETHEUS=true
Example - Multiple exporters: export MELLEA_METRICS_ENABLED=true export MELLEA_METRICS_CONSOLE=true export MELLEA_METRICS_OTLP=true export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 export MELLEA_METRICS_PROMETHEUS=true
Built-in metrics (auto-recorded via plugins when metrics are enabled):
- Token counters: mellea.llm.tokens.input, mellea.llm.tokens.output (unit: tokens)
- Latency histograms: mellea.llm.request.duration (unit: s), mellea.llm.ttfb (unit: s, streaming only)
- Error counter: mellea.llm.errors (unit: {error}), categorized by semantic error type
- Cost counter: mellea.llm.cost.usd (unit: USD), estimated cost when pricing data is available
- Sampling counters: mellea.sampling.attempts, mellea.sampling.successes, mellea.sampling.failures (unit: {attempt}/{sample}/{failure})
- Requirement counters: mellea.requirement.checks (unit: {check}), mellea.requirement.failures (unit: {failure})
- Tool counter: mellea.tool.calls (unit: {call}), tagged by tool name and status
Programmatic usage: from mellea.telemetry.metrics import create_counter, create_histogram
request_counter = create_counter( "mellea.requests", description="Total number of LLM requests", unit="1" ) request_counter.add(1, {"backend": "ollama", "model": "llama2"})
latency_histogram = create_histogram( "mellea.request.duration", description="Request latency distribution", unit="s" ) latency_histogram.record(1.5, {"backend": "ollama"})
Functions
FUNC is_metrics_enabled
is_metrics_enabled() -> bool
Check if metrics collection is enabled.
Returns:
- True if
MELLEA_METRICS_ENABLEDis truthy AND OpenTelemetry is installed.
FUNC create_counter
create_counter(name: str, description: str = '', unit: str = '1') -> Any
Create a counter instrument for monotonically increasing values.
Counters are used for values that only increase, such as:
- Total number of requests
- Total tokens processed
- Total errors encountered
Args:
name: Metric name (e.g., "mellea.requests.total")description: Human-readable description of what this metric measuresunit: Unit of measurement (e.g., "1" for count, "ms" for milliseconds)
Returns:
- Counter instrument (or no-op if metrics disabled)
FUNC create_histogram
create_histogram(name: str, description: str = '', unit: str = '1') -> Any
Create a histogram instrument for recording value distributions.
Histograms are used for values that vary and need statistical analysis:
- Request latency
- Token counts per request
- Response sizes
Args:
name: Metric name (e.g., "mellea.request.duration")description: Human-readable descriptionunit: Unit of measurement (e.g., "ms", "tokens", "bytes")
Returns:
- Histogram instrument (or no-op if metrics disabled)
FUNC create_up_down_counter
create_up_down_counter(name: str, description: str = '', unit: str = '1') -> Any
Create an up-down counter for values that can increase or decrease.
UpDownCounters are used for values that go up and down:
- Active sessions
- Items in a queue
- Memory usage
Args:
name: Metric name (e.g., "mellea.sessions.active")description: Human-readable descriptionunit: Unit of measurement
Returns:
- UpDownCounter instrument (or no-op if metrics disabled)
FUNC record_token_usage_metrics
record_token_usage_metrics(input_tokens: int | None, output_tokens: int | None, model: str, provider: str) -> None
Record token usage metrics following OpenTelemetry Gen-AI semantic conventions.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
input_tokens: Number of input tokens (prompt tokens), or None if unavailableoutput_tokens: Number of output tokens (completion tokens), or None if unavailablemodel: Model identifier (e.g., "gpt-4", "llama2:7b")provider: Provider name (e.g., "openai", "ollama", "watsonx")
FUNC record_request_duration
record_request_duration(duration_s: float, model: str, provider: str, streaming: bool = False) -> None
Record total LLM request duration.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
duration_s: Request duration in secondsmodel: Model identifier (e.g., "gpt-4", "llama2:7b")provider: Provider name (e.g., "openai", "ollama", "watsonx")streaming: Whether the request used streaming mode
FUNC record_ttfb
record_ttfb(ttfb_s: float, model: str, provider: str) -> None
Record time-to-first-token for streaming LLM requests.
This is a no-op when metrics are disabled, ensuring zero overhead. Should only be called for streaming requests.
Args:
ttfb_s: Time to first token in secondsmodel: Model identifier (e.g., "gpt-4", "llama2:7b")provider: Provider name (e.g., "openai", "ollama", "watsonx")
FUNC classify_error
classify_error(exc: BaseException) -> str
Map an exception to a semantic error type string.
Checks OpenAI SDK exception types first (when openai is installed), then falls back to stdlib exceptions and name-based heuristics.
Args:
exc: The exception to classify.
Returns:
- One of the
ERROR_TYPE_*constants.
FUNC record_error
record_error(error_type: str, model: str, provider: str, exception_class: str) -> None
Record an LLM error metric.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
error_type: Semantic error category (useERROR_TYPE_*constants).model: Model identifier (e.g. "gpt-4", "llama2:7b").provider: Provider name (e.g. "openai", "ollama").exception_class: Python exception class name (e.g. "RateLimitError").
FUNC record_cost
record_cost(cost: float, model: str, provider: str) -> None
Record estimated LLM request cost in USD.
This is a no-op when metrics are disabled, ensuring zero overhead.
Only call this when pricing data is available (i.e., compute_cost returned
a non-None value).
Args:
cost: Estimated request cost in US dollars.model: Model identifier (e.g."gpt-4o","claude-sonnet-4-6").provider: Provider name (e.g."openai","ollama").
FUNC record_sampling_attempt
record_sampling_attempt(strategy: str) -> None
Record one sampling attempt for the given strategy.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
strategy: Sampling strategy class name (e.g."RejectionSamplingStrategy").
FUNC record_sampling_outcome
record_sampling_outcome(strategy: str, success: bool) -> None
Record the final outcome (success or failure) of a sampling loop.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
strategy: Sampling strategy class name (e.g."RejectionSamplingStrategy").success:Trueif at least one attempt passed all requirements.
FUNC record_requirement_check
record_requirement_check(requirement: str) -> None
Record one requirement validation check.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
requirement: Requirement class name (e.g."LLMaJRequirement").
FUNC record_requirement_failure
record_requirement_failure(requirement: str, reason: str) -> None
Record one requirement validation failure.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
requirement: Requirement class name (e.g."LLMaJRequirement").reason: Human-readable failure reason fromValidationResult.reason.
FUNC record_tool_call
record_tool_call(tool: str, status: str) -> None
Record one tool invocation.
This is a no-op when metrics are disabled, ensuring zero overhead.
Args:
tool: Name of the tool that was invoked.status:"success"if the tool executed without error,"failure"otherwise.