Use the eval command for LLM-as-a-judge evaluation, given a (set of) test file(s) consisting of prompts, instructions, and optionally, targets. Instantiate a generator model to produce candidate responses, and a judge model to determine whether the instructions have been followed. Instantiate a generator model to produce candidate responses, and a judge model to determine whether the instructions have been followed.Documentation Index
Fetch the complete documentation index at: https://docs.mellea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Functions
FUNC eval_run
test_files: Paths to JSON/JSONL files containing test cases.backend: Generation backend name.model: Generation model name, orNonefor the default.max_gen_tokens: Maximum tokens to generate for each response.judge_backend: Judge backend name, orNoneto reuse the generation backend.judge_model: Judge model name, orNonefor the default.max_judge_tokens: Maximum tokens for the judge model’s output.output_path: File path prefix for the results file.output_format: Output format —"json"or"jsonl".continue_on_error: IfTrue, skip failed tests instead of raising.