Skip to main content
Sampling Strategies for Minimum Bayes Risk Decoding (MBRD).

Classes

CLASS BaseMBRDSampling

Abstract Minimum Bayes Risk Decoding (MBRD) Sampling Strategy. Args:
  • number_of_samples: Number of samples to generate and use for majority voting. Defaults to 8.
  • weighted: Not yet implemented. If True, weights scores before majority vote.
  • loop_budget: Inner rejection-sampling loop count. Must be > 0.
  • requirements: Requirements to validate against. If None, uses per-call requirements.
Attributes:
  • symmetric: Whether the similarity metric is symmetric, allowing the upper-triangle score matrix to be mirrored; always True for this base class.
Methods:

FUNC compare_strings

compare_strings(self, ref: str, pred: str) -> float
Compute a similarity score between a reference and a predicted string. Subclasses must implement this to define the MBRD similarity metric. Args:
  • ref: The reference string to compare against.
  • pred: The predicted string to evaluate.
Returns:
  • A similarity score, typically in [0.0, 1.0] where 1.0
  • indicates a perfect match.

FUNC maybe_apply_weighted

maybe_apply_weighted(self, scr: np.ndarray) -> np.ndarray
Apply per-sample weights to the score vector if self.weighted is True. Currently not implemented; the input array is returned unchanged when self.weighted is True. Args:
  • scr: 1-D array of aggregated similarity scores, one entry per candidate sample.
Returns:
  • np.ndarray: The (possibly weighted) score array.

FUNC sample

sample(self, action: Component[S], context: Context, backend: Backend, requirements: list[Requirement] | None) -> SamplingResult[S]
Samples using majority voting. Args:
  • action : The action object to be sampled.
  • context: The context to be passed to the sampling strategy.
  • backend: The backend used for generating samples.
  • requirements: List of requirements to test against (merged with global requirements).
  • validation_ctx: Optional context to use for validation. If None, validation_ctx = ctx.
  • format: output format for structured outputs; ignored for this sampling strategy.
  • model_options: model options to pass to the backend during generation / validation.
  • tool_calls: True if tool calls should be used during this sampling strategy.
  • show_progress: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog.
Returns:
  • A result object indicating the success or failure of the sampling process.

CLASS MajorityVotingStrategyForMath

MajorityVoting Sampling Strategy for Math Expressions. Args:
  • number_of_samples: Number of samples to generate. Defaults to 8.
  • float_rounding: Decimal places for float comparison. Defaults to 6.
  • strict: Enforce strict comparison mode. Defaults to True.
  • allow_set_relation_comp: Allow set-relation comparisons. Defaults to False.
  • weighted: Not yet implemented. Defaults to False.
  • loop_budget: Rejection-sampling loop count. Defaults to 1.
  • requirements: Requirements to validate against.
Attributes:
  • match_types: Extraction target types used for parsing math expressions; always ["latex", "axpr"], computed at init.
  • symmetric: Inherited from BaseMBRDSampling; always True for this strategy (set explicitly at init).
Methods:

FUNC compare_strings

compare_strings(self, ref: str, pred: str) -> float
Compare two strings using math-aware extraction and verification. Parses both strings into mathematical expressions using the configured match_types (latex and/or expr), then verifies equivalence via math_verify.verify. Args:
  • ref: The reference (gold) string containing a math expression.
  • pred: The predicted string to compare against the reference.
Returns:
  • 1.0 if the expressions are considered equivalent,
  • 0.0 otherwise.

CLASS MBRDRougeLStrategy

Sampling Strategy that uses RougeL to compute symbol-level distances for majority voting. Args:
  • number_of_samples: Number of samples to generate. Defaults to 8.
  • weighted: Not yet implemented. Defaults to False.
  • loop_budget: Rejection-sampling loop count. Defaults to 1.
  • requirements: Requirements to validate against.
Attributes:
  • match_types: Rouge metric names used for scoring (["rougeL"]).
  • scorer: Pre-configured RougeScorer instance used for pairwise string comparison.
  • symmetric: Inherited from BaseMBRDSampling; always True for RougeL (the score is symmetric by construction).
Methods:

FUNC compare_strings

compare_strings(self, ref: str, pred: str) -> float
Compare two strings using the RougeL F-measure. Args:
  • ref: The reference string to score against.
  • pred: The predicted string to evaluate.
Returns:
  • RougeL F-measure score in the range [0.0, 1.0].