Classes
CLASS BaseMBRDSampling
Abstract Minimum Bayes Risk Decoding (MBRD) Sampling Strategy.
Args:
number_of_samples: Number of samples to generate and use for majority voting. Defaults to8.weighted: Not yet implemented. IfTrue, weights scores before majority vote.loop_budget: Inner rejection-sampling loop count. Must be > 0.requirements: Requirements to validate against. IfNone, uses per-call requirements.
symmetric: Whether the similarity metric is symmetric, allowing the upper-triangle score matrix to be mirrored; alwaysTruefor this base class.
FUNC compare_strings
ref: The reference string to compare against.pred: The predicted string to evaluate.
- A similarity score, typically in
[0.0, 1.0]where1.0 - indicates a perfect match.
FUNC maybe_apply_weighted
self.weighted is True.
Currently not implemented; the input array is returned unchanged when
self.weighted is True.
Args:
scr: 1-D array of aggregated similarity scores, one entry per candidate sample.
- np.ndarray: The (possibly weighted) score array.
FUNC sample
action: The action object to be sampled.context: The context to be passed to the sampling strategy.backend: The backend used for generating samples.requirements: List of requirements to test against (merged with global requirements).validation_ctx: Optional context to use for validation. If None, validation_ctx = ctx.format: output format for structured outputs; ignored for this sampling strategy.model_options: model options to pass to the backend during generation / validation.tool_calls: True if tool calls should be used during this sampling strategy.show_progress: if true, a tqdm progress bar is used. Otherwise, messages will still be sent to flog.
- A result object indicating the success or failure of the sampling process.
CLASS MajorityVotingStrategyForMath
MajorityVoting Sampling Strategy for Math Expressions.
Args:
number_of_samples: Number of samples to generate. Defaults to8.float_rounding: Decimal places for float comparison. Defaults to6.strict: Enforce strict comparison mode. Defaults toTrue.allow_set_relation_comp: Allow set-relation comparisons. Defaults toFalse.weighted: Not yet implemented. Defaults toFalse.loop_budget: Rejection-sampling loop count. Defaults to1.requirements: Requirements to validate against.
match_types: Extraction target types used for parsing math expressions; always["latex", "axpr"], computed at init.symmetric: Inherited fromBaseMBRDSampling; alwaysTruefor this strategy (set explicitly at init).
FUNC compare_strings
match_types (latex and/or expr), then verifies equivalence via
math_verify.verify.
Args:
ref: The reference (gold) string containing a math expression.pred: The predicted string to compare against the reference.
1.0if the expressions are considered equivalent,0.0otherwise.
CLASS MBRDRougeLStrategy
Sampling Strategy that uses RougeL to compute symbol-level distances for majority voting.
Args:
number_of_samples: Number of samples to generate. Defaults to8.weighted: Not yet implemented. Defaults toFalse.loop_budget: Rejection-sampling loop count. Defaults to1.requirements: Requirements to validate against.
match_types: Rouge metric names used for scoring (["rougeL"]).scorer: Pre-configuredRougeScorerinstance used for pairwise string comparison.symmetric: Inherited fromBaseMBRDSampling; alwaysTruefor RougeL (the score is symmetric by construction).
FUNC compare_strings
ref: The reference string to score against.pred: The predicted string to evaluate.
- RougeL F-measure score in the range
[0.0, 1.0].