mellea.stdlib.requirements.python_tools
Generic Python tool requirements for code generation validation.
This module provides a set of composable requirements for validating Python code generated by language models. Requirements can be used individually or bundled via the python_code_generation_requirements() factory function.
The requirement pipeline validates code in this order:
- PythonCodeExtraction — code blocks are present and extractable
- PythonSyntaxValid — code parses without syntax errors
- PythonExecutionReq — code runs without exceptions and output stays within bounds
- ImportRestrictions — only whitelisted modules are imported (optional)
OutputSizeLimit is available as a standalone requirement for special cases where output size validation needs to be separate from execution (e.g., validating code with non-deterministic output or checking previously-executed code).
Functions
FUNC python_code_generation_requirements
python_code_generation_requirements(allowed_imports: list[str] | None = None, output_limit_chars: int = 10000, timeout_seconds: int = 5, use_sandbox: bool = False) -> list[Requirement]
Bundle generic Python tool requirements with configurable parameters.
Factory function that creates a complete set of requirements for validating Python code generation, from extraction through execution and output checks.
⚠️ Performance Improvement: As of this version, output size checking is integrated into PythonExecutionReq via the max_output_chars parameter, eliminating the double-execution cost that occurred when OutputSizeLimit was a separate requirement in the bundle. This reduces execution cost for Docker- isolated code and reduces the attack surface for untrusted inputs.
Args:
allowed_imports: Whitelist of importable top-level modules. If None, all imports are allowed. Default None.output_limit_chars: Maximum allowed characters of captured stdout. Default 10,000.timeout_seconds: Maximum execution time in seconds. Default 5.use_sandbox: Use llm-sandbox for Docker-isolated execution. Default False.
Returns:
- list[Requirement]: Requirement instances in validation order (always 4):
- PythonCodeExtraction
- PythonSyntaxValid
- PythonExecutionReq (configured with timeout, sandbox, and output limit)
- ImportRestrictions or NoImportRestrictions
- The fourth requirement is ImportRestrictions if allowed_imports is provided
- (enforcing the whitelist), or NoImportRestrictions if allowed_imports is None
- (documenting that no import checks are configured).
Raises:
ValueError: If timeout_seconds is not positive.ValueError: If output_limit_chars is not positive.
Examples:
>>> # Unrestricted execution with defaults
>>> reqs = python_code_generation_requirements()
>>> len(reqs)
4
>>> isinstance(reqs[3], NoImportRestrictions)
True
>>> # Restricted to safe modules only
>>> reqs = python_code_generation_requirements(
... allowed_imports=["os", "sys", "json"],
... output_limit_chars=5_000,
... )
>>> len(reqs) # always 4, fourth is ImportRestrictions
4
>>> isinstance(reqs[3], ImportRestrictions)
True
>>> # Sandbox mode for untrusted code
>>> reqs = python_code_generation_requirements(
... use_sandbox=True,
... timeout_seconds=10,
... )
Classes
CLASS PythonCodeExtraction
Code blocks are present and extractable from model output.
This requirement checks whether the model's response contains Python code blocks that can be extracted for further validation or execution.
CLASS PythonSyntaxValid
Python code is syntactically valid (parses without AST errors).
Uses Python's ast.parse() to validate syntax without executing code. Useful for catching malformed code early in the validation pipeline.
CLASS OutputSizeLimit
Captured output does not exceed size limit (in characters).
Executes code and verifies that the captured stdout does not exceed the configured character limit. Useful for preventing excessive logging or infinite output loops.
⚠️ Prefer PythonExecutionReq(max_output_chars=...): Starting in this version, output size checking is integrated into PythonExecutionReq, eliminating the double-execution cost. Use this class only for special cases where you need output size validation as a separate execution pass:
- Validating code with non-deterministic output (where separate execution passes are intentional).
- Re-checking output size of code executed elsewhere, without repeating the earlier execution.
- Niche cases where static tier validation is desired (though static tier does not execute code, so this is rarely useful).
For typical use in python_code_generation_requirements(), the bundle now uses PythonExecutionReq(max_output_chars=...) internally, so this class is no longer added to the bundle.
⚠️ Static Tier Behavior: When execution_tier is "static" (the default), no code execution occurs and output size is not validated. The requirement returns result=True, but this does not indicate the requirement was satisfied — only that validation was skipped. This is intentional: static tier is the no-execution baseline. If you need actual output size validation, set execution_tier to "local", "local_unsafe", or "docker".
Args:
limit_chars: Maximum allowed output size in characters. Defaults to 10,000.execution_tier: Execution environment tier. Defaults to"static"(no execution; output limit not enforced).policy: Optional CapabilityPolicy to override tier defaults.allowed_imports: Whitelist of importable top-level modules. None allows all.
CLASS ImportRestrictions
Only whitelisted modules are imported in the code.
Uses AST analysis to find all imports (Import and ImportFrom nodes) and validates them against an optional allowlist. If an empty list is provided, all imports are blocked. If None is provided, all imports are accepted.
⚠️ Not a Security Boundary: This is a static AST-based guidance check,
not a security control. It only detects static import statements (import x
and from x import y) and does NOT detect dynamic imports such as:
__import__("subprocess")importlib.import_module("socket")exec("import urllib")eval("import os")
For real isolation from untrusted code, combine this requirement with
execution_tier="docker" and a restrictive CapabilityPolicy.
The execution environment sandbox provides the actual security boundary,
not the import allowlist.
Args:
allowed_imports: List of module names that are allowed to be imported. If None, all imports are accepted. If an empty list, all imports are blocked.
CLASS NoImportRestrictions
Explicit no-op requirement indicating no import checks are configured.
This requirement always passes and documents that import restrictions are not being enforced. Used by python_code_generation_requirements() when allowed_imports is None, providing semantic clarity in the requirement bundle that import validation was intentionally skipped (rather than silently omitting the requirement from the list).