Build a Code Execution Agent with Llama, E2B Sandbox and MAX Serve
Help us improve and tell us what you’d like us to build next.
Request a recipe topicREADME
This recipe demonstrates how to build a secure code execution assistant that combines:
The assistant provides:
Please make sure your system meets our system requirements.
To proceed, ensure you have the magic
CLI installed with the magic --version
to be 0.7.2 or newer:
curl -ssL https://magic.modular.com/ | bash
or update it via:
magic self-update
Then install max-pipelines
via:
magic global install -u max-pipelines
This recipe requires a GPU with CUDA 12.5 support. Recommended GPUs:
E2B API Key: Required for sandbox access
.env
file: E2B_API_KEY=your_key_here
Hugging Face Token (optional): For faster model downloads
.env
file: HF_TOKEN=your_token_here
Download the code using the magic
CLI:
magic init code-execution-sandbox-agent-with-e2b --from modular/max-recipes/code-execution-sandbox-agent-with-e2b
cd code-execution-sandbox-agent-with-e2b
Copy the environment template:
cp .env.example .env
Add your API keys to .env
Test the sandbox:
magic run hello
This command runs a simple test to verify your E2B sandbox setup. You'll see a "hello world" output and a list of available files in the sandbox environment, confirming that code execution is working properly.
Start the LLM server:
Make sure the port 8010
is available. You can adjust the port settings in pyproject.toml.
magic run server
This launches the Llama model with MAX Serve, enabling structured output parsing for reliable code generation. The server runs locally on port 8010
and uses the --enable-structured-output
flag for OpenAI-compatible function calling.
Run the interactive agent:
magic run agent
This starts the interactive Python assistant. You can now type natural language queries like:
The demo below shows the agent in action, demonstrating:
The system follows a streamlined flow for code generation and execution:
graph TB
subgraph User Interface
CLI[Rich CLI Interface]
end
subgraph Backend
LLM[Llama Model]
Parser[Structured Output Parser]
Sandbox[E2B Sandbox]
Executor[Code Executor]
end
CLI --> LLM
LLM --> Parser
Parser --> Executor
Executor --> Sandbox
Sandbox --> CLI
Here's how the components work together:
Rich CLI Interface:
Llama Model:
Structured Output Parser:
Code Executor:
E2B Sandbox:
The flow ensures secure and reliable code execution while providing a seamless user experience with clear feedback at each step.
The hello.py
script demonstrates basic E2B sandbox functionality:
from e2b_code_interpreter import Sandbox
from dotenv import load_dotenv
load_dotenv()
sbx = Sandbox() # Creates a sandbox environment
execution = sbx.run_code("print('hello world')") # Executes Python code
# Access execution results
for line in execution.logs.stdout:
print(line.strip())
# List sandbox files
files = sbx.files.list("/")
Key features:
The agent implements a complete code execution assistant with these additional key features:
LLM_SERVER_URL = os.getenv("LLM_SERVER_URL", "http://localhost:8010/v1")
LLM_API_KEY = os.getenv("LLM_API_KEY", "local")
MODEL = os.getenv("MODEL", "modularai/Llama-3.1-8B-Instruct-GGUF")
tools = [{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute python code blocks in sequence",
"parameters": CodeExecution.model_json_schema()
}
}]
def execute_python(blocks: List[CodeBlock]) -> str:
with Sandbox() as sandbox:
full_code = "\n\n".join(block.code for block in blocks)
# Step 1: Show the code to be executed
console.print(Panel(
Syntax(full_code, "python", theme="monokai"),
title="[bold blue]Step 1: Code[/bold blue]",
border_style="blue"
))
execution = sandbox.run_code(full_code)
output = execution.logs.stdout if execution.logs and execution.logs.stdout else execution.text
output = ''.join(output) if isinstance(output, list) else output
# Step 2: Show the execution result
console.print(Panel(
output or "No output",
title="[bold green]Step 2: Result[/bold green]",
border_style="green"
))
return output
Three-Step Output Process:
Interactive Session Management:
def main():
console.print(Panel("Interactive Python Assistant (type 'exit' to quit)",
border_style="cyan"))
while True:
query = console.input("[bold yellow]Your query:[/bold yellow] ")
if query.lower() in ['exit', 'quit']:
console.print("[cyan]Goodbye![/cyan]")
break
# ... process query ...
explanation_messages = [
{
"role": "system",
"content": "You are a helpful assistant. Explain what the code did and its result clearly and concisely."
},
{
"role": "user",
"content": f"Explain this code and its result:\n\nCode:\n{code}\n\nResult:\n{result}"
}
]
The agent uses OpenAI's structured output format to ensure reliable code generation and execution. Here's how it works:
from pydantic import BaseModel
from typing import List
# Define the expected response structure
class CodeBlock(BaseModel):
type: str
code: str
class CodeExecution(BaseModel):
code_blocks: List[CodeBlock]
# Define the function calling schema
tools = [{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute python code blocks in sequence",
"parameters": CodeExecution.model_json_schema()
}
}]
from openai import OpenAI
# Configure the client with local LLM server
client = OpenAI(
base_url=LLM_SERVER_URL, # "http://localhost:8010/v1"
api_key=LLM_API_KEY # "local"
)
messages = [
{
"role": "system",
"content": """You are a Python code execution assistant. Generate complete, executable code based on user queries.
Important rules:
1. Always include necessary imports at the top
2. Always include print statements to show results
3. Make sure the code is complete and can run independently
4. Test all variables are defined before use
"""
},
{
"role": "user",
"content": query
}
]
try:
# Parse the response into structured format
response = client.beta.chat.completions.parse(
model=MODEL,
messages=messages,
response_format=CodeExecution
)
# Extract code blocks from the response
code_blocks = response.choices[0].message.parsed.code_blocks
# Execute the code
result = execute_python(code_blocks)
except Exception as e:
console.print(Panel(f"Error: {str(e)}", border_style="red"))
{
"code_blocks": [
{
"type": "python",
"code": "def factorial(n):\n if n == 0:\n return 1\n return n * factorial(n-1)\n\nresult = factorial(5)\nprint(f'Factorial of 5 is: {result}')"
}
]
}
# Generate explanation using vanilla completion
explanation_messages = [
{
"role": "system",
"content": "You are a helpful assistant. Explain what the code did and its result clearly and concisely."
},
{
"role": "user",
"content": f"Explain this code and its result:\n\nCode:\n{code_blocks[0].code}\n\nResult:\n{result}"
}
]
final_response = client.chat.completions.create(
model=MODEL,
messages=explanation_messages
)
explanation = final_response.choices[0].message.content
Key benefits of this structured approach:
This structured approach ensures that:
You can interact with the agent using natural language queries like:
print("Hello")
and executing itSystem Prompt:
Code Execution Flow:
Error Handling:
MODEL = os.getenv("MODEL", "modularai/Llama-3.1-8B-Instruct-GGUF")
Sandbox(timeout=300) # Configure timeout
# Customize Rich themes and styles
console.print(Panel(..., theme="custom"))
Sandbox Issues
LLM Issues
Code Execution Issues
Enhance the System
Deploy to Production
Join the Community
#ModularAI
on social mediaWe're excited to see what you'll build with this foundation!
DETAILS
AUTHOR
Ehsan M. Kermani
AVAILABLE TASKS
magic run hello
magic run server
magic run agent
PROBLEMS WITH THE CODE?
File an Issue
TAGS
Help us improve and tell us what you’d like us to build next.
Request a recipe topic