Exposing every tool (deploy_to_aws, create_s3_bucket, setup_cloudfront, etc.) to the LLM creates token overhead. The model must understand 22+ function signatures on every request, costing ~3,500 tokens in tool definitions alone.

Instead of exposing 22 tools, expose 4: file_read, file_write, shell, and execute_multos_code(). The LLM writes Python code using a simple SDK. 83% reduction in tool definitions, 60% verified cost reduction. With prompt caching on Claude and GPT, system prompts are cached at 90% discount for up to 72% total savings.

Code runs in isolated Modal sandboxes with restricted permissions. No access to user credentials directly - all cloud operations go through secure proxy functions.

60% cost reduction verified in production (up to 72% with prompt caching). Tool definitions drop from ~3,500 to ~600 tokens. Enterprise customers save significantly on token costs at scale.

Code Mode: How We Achieved 60% Token Reduction

The Problem with Tool Exposure

Code Mode Solution

Security & Sandboxing

Real-World Impact

Ready to Get Started?