mellea.stdlib.tools.interpreter
Code interpreter tool and execution environments for agentic workflows.
Provides ExecutionResult (capturing stdout, stderr, exit code, artifacts, and
optional static analysis output) and three concrete ExecutionEnvironment
implementations:
StaticAnalysisEnvironment— parse and import-check only, no execution.UnsafeEnvironment— subprocess execution in the current Python environment.LLMSandboxEnvironment— Docker-isolated execution viallm-sandbox, withcopy_in/copy_outsupport viadocker cp.
Use make_execution_environment to select an environment by tier name
("local_unsafe", "local", "docker_unsafe", "docker") rather than
constructing classes directly. The top-level code_interpreter and
local_code_interpreter functions are ready to be wrapped as MelleaTool
instances for ReACT or other agentic loops.
Functions
FUNC make_execution_environment
make_execution_environment(tier: ExecutionTier, policy: CapabilityPolicy | None = None, allowed_imports: list[str] | None = None, working_directory: str | None = None, _install_cache: set[str] | None = None, _failed_cache: set[str] | None = None) -> ExecutionEnvironment
Create an :class:ExecutionEnvironment for the given tier.
The policy argument overrides the tier's default policy. For unsafe
tiers ("local_unsafe", "docker_unsafe") the policy defaults to
None — pass an explicit policy to add declaration without changing the
tier.
Args:
tier: One of"static","local_unsafe","local","docker_unsafe", or"docker".policy: Override the tier's default policy.Noneuses the tier default (LOCAL_POLICY/DOCKER_POLICYfor policy tiers;Nonefor unsafe tiers).allowed_imports: Allowlist of importable top-level modules.Noneallows any import.working_directory: Directory to use as cwd during execution. Only honoured byUnsafeEnvironment(local tiers); ignored for Docker and static tiers._install_cache: Shared set of already-installed package names. When provided, the environment will not reinstall packages already present in the set, and will add newly installed packages to it. Pass the same set across multiplemake_execution_environmentcalls to avoid redundant installs within one tool lifetime._failed_cache: Shared set of package names whose installation has already failed. Packages in this set are skipped on subsequent calls; clear the set to allow a retry. Pass the same set as_install_cacheto persist failure state across calls.
Returns:
- Configured environment instance.
Raises:
ValueError: Iftieris not one of the recognised execution tier strings.
FUNC python_tool
python_tool(tier: ExecutionTier = 'local_unsafe', packages: list[str] | None = None, artifact_dir: Path | None = None, policy: CapabilityPolicy | None = None, allowed_imports: list[str] | None = None, name: str = 'python', suppress_agg: bool = False) -> MelleaTool
Create a configurable Python execution tool that returns structured artifacts.
The returned MelleaTool wraps a callable with signature
run_python(code: str) -> ExecutionResult. It can be passed directly to
ModelOption.TOOLS in agentic ReACT loops.
For local tiers ("local_unsafe", "local"), files written to
artifact_dir (or to the per-call tempdir when artifact_dir is None)
are surfaced as Artifact objects on the returned ExecutionResult. Only
files produced by a successful execution (exit code 0) are included.
When the executed code imports matplotlib, matplotlib.use('Agg') is
injected automatically as a preamble so plots are written to files rather
than attempting interactive display. Pass suppress_agg=True to disable
this injection (e.g. when the code sets its own backend explicitly).
Args:
tier: Execution tier — one of"static","local_unsafe","local","docker_unsafe", or"docker". Defaults to"local_unsafe".packages: Python packages to pre-install viapip installbefore the first execution. Ignored for the"static"tier.Noneor[]means no installs. Each specifier must be a non-empty string and must not begin with-(flag-style arguments are rejected); PEP 508 specifiers such aspkg @ git+https\://...are accepted. Strings are passed directly to pip/uv — callers are responsible for trusting their content as if invokingpip installthemselves. Not thread-safe: the shared install/failed caches are mutated without a lock, so concurrentrun_pythoncalls on the same tool instance may race on first install.artifact_dir: Directory where the executed code should write output files. A per-call tempdir is used whenNone; that tempdir is kept alive as long as the returnedExecutionResultholds artifacts, and cleaned up immediately when no artifacts are produced. Ignored for docker tiers.policy: Override the tier's defaultCapabilityPolicy. Whenpackagesis also provided, those packages are merged into this policy.allowed_imports: Allowlist of importable top-level modules.Nonedisables the import check.name: Tool name exposed to the model. Defaults to"python".suppress_agg: WhenTrue, skip the automaticmatplotlib.use('Agg')preamble injection. Use this when the executed code sets its own matplotlib backend. Defaults toFalse.
Returns:
- A configured tool ready for use in
ModelOption.TOOLS.
Raises:
ImportError: IfMelleaToolcannot be imported (should not happen in a normal mellea installation).ValueError: If any entry inpackagesis empty or begins with-.
Example::
from mellea.stdlib.tools import python_tool
tool = python_tool(packages=["matplotlib", "numpy"]) result = tool.run(code="import numpy as np; print(np.sqrt(4))") print(result.stdout) # "2.0" print(result.artifacts) # files written during execution
FUNC code_interpreter
code_interpreter(code: str) -> ExecutionResult
Execute Python code in a Docker sandbox (docker_unsafe tier).
.. deprecated::
Use :func:python_tool instead::
from mellea.stdlib.tools import python_tool result = python_tool(tier="docker_unsafe").run(code=code)
Args:
code: The Python code to execute.
Returns:
- An
ExecutionResultwith stdout, stderr, and a success flag.
FUNC local_code_interpreter
local_code_interpreter(code: str) -> ExecutionResult
Execute Python code in the current process environment (local_unsafe tier).
.. deprecated::
Use :func:python_tool instead::
from mellea.stdlib.tools import python_tool result = python_tool(tier="local_unsafe").run(code=code)
Args:
code: The Python code to execute.
Returns:
- An
ExecutionResultwith stdout, stderr, and a success flag.
Classes
CLASS ExecutionResult
Result of code execution.
Code execution can be aborted prior to spinning up an interpreter (e.g., if
prohibited imports are used). In these cases, success is False and
skipped is True.
If code is executed, success is True iff the exit code is 0, and
stdout / stderr are non-None.
Args:
success:Trueif execution succeeded (exit code 0 or static-analysis passed);Falseotherwise.stdout: Captured standard output, orNoneif execution was skipped.stderr: Captured standard error, orNoneif execution was skipped.skipped:Truewhen execution was not attempted.skip_message: Explanation of why execution was skipped.analysis_result: Optional payload from static-analysis environments.exit_code: Raw process exit code, orNoneif not available (skipped or static analysis).timed_out:Truewhen execution was killed due to timeout.artifacts: Files exported from the execution environment after execution.execution_mode: Tier name used for this execution ("local_unsafe","local","docker_unsafe","docker","static", or"unknown").working_directory: The working directory used for execution, orNoneif the default was used or not applicable.
Methods:
FUNC to_validationresult_reason
to_validationresult_reason(self) -> str
Map an ExecutionResult to a ValidationResult reason string.
Returns:
- The skip message if the execution was skipped, stdout on success,
- or stderr on failure.
CLASS ExecutionEnvironment
Abstract environment for executing Python code.
Args:
allowed_imports: Allowlist of top-level module names that generated code may import.Nonedisables the import check.policy: Capability policy for this environment.Nonemeans no policy is applied (unsafe tiers).working_directory: Directory to use as cwd during execution.Nonemeans use the process default. Only honoured by environments that spawn subprocesses (UnsafeEnvironment); ignored otherwise.
Methods:
FUNC execute
execute(self, code: str, timeout: int | None = None) -> ExecutionResult
Execute the given code and return the result.
Args:
code: The Python source code to execute.timeout: Maximum seconds to allow the code to run. WhenNone, the environment's policy timeout is used, or a built-in default if no policy is set.
Returns:
- Execution outcome including stdout, stderr, and
- success flag.
FUNC copy_in
copy_in(self, host_path: Path, container_path: str) -> None
Copy a file from the host into the execution environment.
Args:
host_path: Absolute path on the host filesystem.container_path: Destination path inside the environment.
Raises:
NotImplementedError: If this environment does not support file I/O.
FUNC copy_out
copy_out(self, container_path: str, host_path: Path) -> None
Copy a file from the execution environment to the host.
Args:
container_path: Source path inside the environment.host_path: Destination path on the host filesystem.
Raises:
NotImplementedError: If this environment does not support file I/O.
CLASS StaticAnalysisEnvironment
Safe environment that validates but does not execute code.
Methods:
FUNC execute
execute(self, code: str, timeout: int | None = None) -> ExecutionResult
Validate code syntax and imports without executing.
Args:
code: The Python source code to validate.timeout: Ignored for static analysis; present for interface compatibility.
Returns:
- Result with
skipped=Trueand the parsed AST in analysis_resulton success, or a syntax-error description on- failure.
CLASS UnsafeEnvironment
Environment that executes code directly via subprocess.
No container isolation. Use policy to declare (but not enforce)
capabilities; timeout and stdout/stderr truncation from policy
are actively enforced.
Args:
allowed_imports: Allowlist of top-level module names that generated code may import.Nonedisables the import check.policy: Capability policy for this environment.Nonemeans no policy is applied.working_directory: Directory to use as cwd during execution.Nonemeans use the process default.installed_packages: Shared set to persist the install cache across multipleexecute()calls.Nonecreates a fresh set.failed_packages: Shared set of package names whose installation has already failed. Packages in this set are skipped on subsequent calls; clear the set to allow a retry.Nonecreates a fresh set.tier: Tier name reported inExecutionResult.execution_mode.Noneinfers the tier from policy presence ("local"when a policy is set,"local_unsafe"otherwise). Prefer passing an explicit value rather than relying on inference;make_execution_environmentalways supplies one.
Methods:
FUNC execute
execute(self, code: str, timeout: int | None = None) -> ExecutionResult
Execute code with subprocess after checking imports.
Args:
code: The Python source code to execute.timeout: Maximum seconds before the subprocess is killed. Falls back topolicy.timeoutif set, then to 30 s.
Returns:
- Execution outcome with captured stdout/stderr and
- success flag, or a skipped result if imports are unauthorized.
CLASS LLMSandboxEnvironment
Docker-isolated execution environment via llm-sandbox.
Supports copy_in and copy_out via docker cp. Both methods require
the environment to be used as a context manager so that a single container
session persists across calls.
When used without a context manager, execute opens and closes a fresh
container per call (one-shot mode), which is sufficient when file I/O is
not needed.
Args:
allowed_imports: Allowlist of importable top-level modules.Noneallows any import.policy: Capability policy.Nonemeans no policy is applied (docker_unsafetier).working_directory: Ignored for Docker tiers; present for interface compatibility withExecutionEnvironment.installed_packages: Shared set to persist the install cache across multipleexecute()calls.Nonecreates a fresh set.failed_packages: Shared set of package names whose installation has already failed. Packages in this set are skipped on subsequent calls; clear the set to allow a retry.Nonecreates a fresh set.tier: Tier name reported inExecutionResult.execution_mode.Noneinfers the tier from policy presence ("docker"when a policy is set,"docker_unsafe"otherwise). Prefer passing an explicit value;make_execution_environmentalways supplies one.
Methods:
FUNC copy_in
copy_in(self, host_path: Path, container_path: str) -> None
Copy a file from the host into the running Docker container via docker cp.
Args:
host_path: Absolute path on the host filesystem.container_path: Destination path inside the container.
Raises:
RuntimeError: If the environment is not open as a context manager.RuntimeError: If the container ID cannot be determined.subprocess.CalledProcessError: Ifdocker cpfails.
FUNC copy_out
copy_out(self, container_path: str, host_path: Path) -> None
Copy a file from the running Docker container to the host via docker cp.
Args:
container_path: Source path inside the container.host_path: Destination path on the host filesystem.
Raises:
RuntimeError: If the environment is not open as a context manager.RuntimeError: If the container ID cannot be determined.subprocess.CalledProcessError: Ifdocker cpfails.
FUNC execute
execute(self, code: str, timeout: int | None = None) -> ExecutionResult
Execute code in a Docker container.
When used as a context manager, reuses the open session. Otherwise opens a fresh container, runs the code, and closes it immediately.
Args:
code: The Python source code to execute.timeout: Maximum seconds to allow the sandboxed process to run. Falls back topolicy.timeoutif set, then to 60 s.
Returns:
- Execution outcome with stdout/stderr and success flag,
- or a skipped result on import violation or sandbox error.