mellea.stdlib.sampling.majority_voting
Sampling Strategies for Minimum Bayes Risk Decoding (MBRD).
Classes
CLASS BaseMBRDSampling
Abstract Minimum Bayes Risk Decoding (MBRD) Sampling Strategy.
Args:
number_of_samples: Number of samples to generate and use for majority voting. Defaults to8.weighted: Not yet implemented. IfTrue, weights scores before majority vote.loop_budget: Inner rejection-sampling loop count. Must be > 0.requirements: Requirements to validate against. IfNone, uses per-call requirements.
Attributes:
symmetric: Whether the similarity metric is symmetric, allowing the upper-triangle score matrix to be mirrored; alwaysTruefor this base class.
Methods:
FUNC compare_strings
compare_strings(self, ref: str, pred: str) -> float
Compute a similarity score between a reference and a predicted string.
Subclasses must implement this to define the MBRD similarity metric.
Args:
ref: The reference string to compare against.pred: The predicted string to evaluate.
Returns:
- A similarity score, typically in
[0.0, 1.0]where1.0 - indicates a perfect match.
FUNC maybe_apply_weighted
maybe_apply_weighted(self, scr: np.ndarray) -> np.ndarray
Apply per-sample weights to the score vector if self.weighted is True.
Currently not implemented; the input array is returned unchanged when
self.weighted is True.
Args:
scr: 1-D array of aggregated similarity scores, one entry per candidate sample.
Returns:
- np.ndarray: The (possibly weighted) score array.
FUNC sample
sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S]
Samples using majority voting.
Args:
action: The action object to be sampled.context: The context to be passed to the sampling strategy.backend: The backend used for generating samples.requirements: List of requirements to test against (merged with global requirements).validation_ctx: Optional context to use for validation. If None, validation_ctx = ctx.format: output format for structured outputs; ignored for this sampling strategy.model_options: model options to pass to the backend during generation / validation.tool_calls: True if tool calls should be used during this sampling strategy.show_progress: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog.
Returns:
- SamplingResult[S]: A result object indicating the success or failure of the sampling process.
CLASS MajorityVotingStrategyForMath
MajorityVoting Sampling Strategy for Math Expressions.
For free-text outputs, use MBRDRougeLStrategy instead.
Args:
number_of_samples: Number of samples to generate. Defaults to8.float_rounding: Decimal places for float comparison. Defaults to6.strict: Enforce strict comparison mode. Defaults toTrue.allow_set_relation_comp: Allow set-relation comparisons. Defaults toFalse.weighted: Not yet implemented. Defaults toFalse.loop_budget: Rejection-sampling loop count. Defaults to1.requirements: Requirements to validate against.
Attributes:
match_types: Extraction target types used for parsing math expressions; always["latex", "axpr"], computed at init.symmetric: Inherited fromBaseMBRDSampling; alwaysTruefor this strategy (set explicitly at init).
Methods:
FUNC compare_strings
compare_strings(self, ref: str, pred: str) -> float
Compare two strings using math-aware extraction and verification.
Parses both strings into mathematical expressions using the configured
match_types (latex and/or expr), then verifies equivalence via
math_verify.verify.
Args:
ref: The reference (gold) string containing a math expression.pred: The predicted string to compare against the reference.
Returns:
1.0if the expressions are considered equivalent,0.0otherwise.
CLASS MBRDRougeLStrategy
Sampling Strategy that uses RougeL to compute symbol-level distances for majority voting.
This is the general-purpose majority voting strategy for text outputs.
Args:
number_of_samples: Number of samples to generate. Defaults to8.weighted: Not yet implemented. Defaults toFalse.loop_budget: Rejection-sampling loop count. Defaults to1.requirements: Requirements to validate against.
Attributes:
match_types: Rouge metric names used for scoring (["rougeL"]).scorer: Pre-configuredRougeScorerinstance used for pairwise string comparison.symmetric: Inherited fromBaseMBRDSampling; alwaysTruefor RougeL (the score is symmetric by construction).
Methods:
FUNC compare_strings
compare_strings(self, ref: str, pred: str) -> float
Compare two strings using the RougeL F-measure.
Args:
ref: The reference string to score against.pred: The predicted string to evaluate.
Returns:
- RougeL F-measure score in the range
[0.0, 1.0].