mellea.stdlib.sampling.majority_voting

Sampling Strategies for Minimum Bayes Risk Decoding (MBRD).

Classes

CLASS `BaseMBRDSampling`

Abstract Minimum Bayes Risk Decoding (MBRD) Sampling Strategy.

Args:

number_of_samples: Number of samples to generate and use for majority voting. Defaults to 8.
weighted: Not yet implemented. If True, weights scores before majority vote.
loop_budget: Inner rejection-sampling loop count. Must be > 0.
requirements: Requirements to validate against. If None, uses per-call requirements.

Attributes:

symmetric: Whether the similarity metric is symmetric, allowing the upper-triangle score matrix to be mirrored; always True for this base class.

Methods:

FUNC `compare_strings`

compare_strings(self, ref: str, pred: str) -> float

Compute a similarity score between a reference and a predicted string.

Subclasses must implement this to define the MBRD similarity metric.

Args:

ref: The reference string to compare against.
pred: The predicted string to evaluate.

Returns:

A similarity score, typically in [0.0, 1.0] where 1.0
indicates a perfect match.

FUNC `maybe_apply_weighted`

maybe_apply_weighted(self, scr: np.ndarray) -> np.ndarray

Apply per-sample weights to the score vector if self.weighted is True.

Currently not implemented; the input array is returned unchanged when self.weighted is True.

Args:

scr: 1-D array of aggregated similarity scores, one entry per candidate sample.

Returns:

np.ndarray: The (possibly weighted) score array.

FUNC `sample`

sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S]

Samples using majority voting.

Args:

action : The action object to be sampled.
context: The context to be passed to the sampling strategy.
backend: The backend used for generating samples.
requirements: List of requirements to test against (merged with global requirements).
validation_ctx: Optional context to use for validation. If None, validation_ctx = ctx.
format: output format for structured outputs; ignored for this sampling strategy.
model_options: model options to pass to the backend during generation / validation.
tool_calls: True if tool calls should be used during this sampling strategy.
show_progress: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog.

Returns:

SamplingResult[S]: A result object indicating the success or failure of the sampling process.

CLASS `MajorityVotingStrategyForMath`

MajorityVoting Sampling Strategy for Math Expressions.

For free-text outputs, use MBRDRougeLStrategy instead.

Args:

number_of_samples: Number of samples to generate. Defaults to 8.
float_rounding: Decimal places for float comparison. Defaults to 6.
strict: Enforce strict comparison mode. Defaults to True.
allow_set_relation_comp: Allow set-relation comparisons. Defaults to False.
weighted: Not yet implemented. Defaults to False.
loop_budget: Rejection-sampling loop count. Defaults to 1.
requirements: Requirements to validate against.

Attributes:

match_types: Extraction target types used for parsing math expressions; always ["latex", "axpr"], computed at init.
symmetric: Inherited from BaseMBRDSampling; always True for this strategy (set explicitly at init).

Methods:

FUNC `compare_strings`

compare_strings(self, ref: str, pred: str) -> float

Compare two strings using math-aware extraction and verification.

Parses both strings into mathematical expressions using the configured match_types (latex and/or expr), then verifies equivalence via math_verify.verify.

Args:

ref: The reference (gold) string containing a math expression.
pred: The predicted string to compare against the reference.

Returns:

1.0 if the expressions are considered equivalent,
0.0 otherwise.

CLASS `MBRDRougeLStrategy`

Sampling Strategy that uses RougeL to compute symbol-level distances for majority voting.

This is the general-purpose majority voting strategy for text outputs.

Args:

number_of_samples: Number of samples to generate. Defaults to 8.
weighted: Not yet implemented. Defaults to False.
loop_budget: Rejection-sampling loop count. Defaults to 1.
requirements: Requirements to validate against.

Attributes:

match_types: Rouge metric names used for scoring (["rougeL"]).
scorer: Pre-configured RougeScorer instance used for pairwise string comparison.
symmetric: Inherited from BaseMBRDSampling; always True for RougeL (the score is symmetric by construction).

Methods:

FUNC `compare_strings`

compare_strings(self, ref: str, pred: str) -> float

Compare two strings using the RougeL F-measure.

Args:

ref: The reference string to score against.
pred: The predicted string to evaluate.

Returns:

RougeL F-measure score in the range [0.0, 1.0].

Classes​

CLASS BaseMBRDSampling ​

FUNC compare_strings ​

FUNC maybe_apply_weighted ​

FUNC sample ​

CLASS MajorityVotingStrategyForMath ​

FUNC compare_strings ​

CLASS MBRDRougeLStrategy ​

FUNC compare_strings ​

Classes

CLASS `BaseMBRDSampling`

FUNC `compare_strings`

FUNC `maybe_apply_weighted`

FUNC `sample`

CLASS `MajorityVotingStrategyForMath`

FUNC `compare_strings`

CLASS `MBRDRougeLStrategy`

FUNC `compare_strings`