Magy Platform
A production-grade multi-agent AI orchestration platform written in Rust. Role-based agents collaborate through NATS JetStream messaging, with cost-aware LLM routing, 4-tier memory, a persistent knowledge graph, and real-time 3D visualization.
Getting Started
Clone the repo
git clone https://github.com/anthropics/magy.git && cd magyStart infrastructure
docker compose up -d # Postgres, Redis, NATSConfigure
cp magy.toml.example magy.toml
# Edit magy.toml — set provider API keys, agent configSet API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..." # optional, for embeddingsRun migrations
cargo run -- migrateStart backend
cargo run -- up --config magy.toml # API on :3011Start frontend
cd magyverse && bun install && bun run dev # :3000System Architecture
Text version (accessibility)
External Events → Connectors → NATS JetStream → Task Scheduler
→ Agent Runtime → LLM Router → Tools → Memory → Knowledge Graph → MagyVerse UIPayload Separation
Critical design rule: Payload::Data(json) never touches the LLM.Payload::Text is for natural language.Payload::Stream for token-by-token output.Payload::Signal for control (heartbeat, ping, cancel, shutdown).
Trait-Driven Design
Core abstractions via Rust traits: Agent (handle, health, shutdown), LlmProvider (complete, stream), Memory (remember, recall, search_semantic), Tool (execute with sandbox), Plugin (init, shutdown), Connector (authenticate, on_event). All async via async_trait.
Agent System
System design, architecture decisions, task decomposition
Code implementation, file operations, git workflows
Testing, validation, quality assurance
Project coordination, status tracking, prioritization
Infrastructure, deployment, monitoring
Specialized coding agent with diff-aware file editing
Engineering leadership, strategy, cross-agent coordination
Agent Lifecycle
// Each agent is a lightweight Tokio task (8KB overhead, sub-μs spawn)
#[async_trait]
pub trait Agent: Send + Sync + 'static {
fn id(&self) -> &AgentId;
fn capabilities(&self) -> &[Capability];
async fn handle(&self, msg: MagyMessage, ctx: &AgentContext) -> Result<Vec<MagyMessage>>;
async fn health(&self) -> HealthStatus; // No LLM involved
async fn shutdown(&self) -> Result<()>;
}
// AgentContext provides shared resources:
// - LLM router (cost-aware model selection)
// - Tool registry (sandboxed execution)
// - Memory system (4-tier persistence)
// - Knowledge graph (cross-agent learning)
// - Tool result cache (deduplication)Identity System
PersonalityProfile + ReputationScore. Agents have distinct identities with peer-rated capabilities and configurable personalities.
Speculative Execution
Predicts which tools an agent will call next based on history patterns. Pre-fetches results to eliminate wait time in subsequent rounds.
Loop Detection
Detects and breaks infinite tool-call cycles. Tracks call patterns with configurable thresholds to prevent agents from getting stuck.
Personality & Expertise
Each agent can be configured with personality traits (risk_tolerance, collaboration_style, verbosity) and expertise areas (primary, secondary). These shape how agents approach tasks, communicate, and collaborate.
[agent.personality]
traits = ["methodical", "detail-oriented"]
risk_tolerance = 0.3 # 0.0 conservative → 1.0 aggressive
collaboration_style = "lead" # lead | contributor | reviewer
verbosity = "concise"
[agent.expertise]
primary = ["rust", "architecture"]
secondary = ["typescript", "devops"]Self-Improving Agents
Every agent learns from task completions through a fire-and-forget knowledge loop. After handle() completes, learn_from_interaction() stores an episodic memory summary and — for non-trivial interactions — spawns a background Haiku LLM call to extract knowledge nodes and edges into the shared Knowledge Graph.
Knowledge Injection
Before each task, build_knowledge_context() queries the graph for relevant knowledge and injects it into the system prompt.
Confidence Decay
Stale knowledge fades naturally. Frequently reinforced knowledge persists.
score *= e^(-λt) // exponential decay
reinforcement: score = min(1.0, score + boost)Worktree Isolation
When multiple agents work concurrently, they can conflict on the shared filesystem. Worktree isolation gives each task its own git branch + worktree. Tools are unaware — sandbox.workspace transparently points at the worktree path.
Configuration
[[agent]]
id = "nova"
[agent.worktree]
enabled = true
on_success = "create_pr" # auto_merge | create_pr | branch_only
on_failure = "keep" # keep | cleanup
base_branch = "main"
worktree_dir = ".worktrees"On Success
AutoMergeMerge into base with --no-ff, cleanup worktree + branchCreatePRPush branch, create pull request, remove worktreeBranchOnlyCommit changes, remove worktree, keep branchOn Failure
KeepLeave worktree on disk for debugging (default)CleanupRemove worktree and delete the branchLLM Router
Cost-Aware Adaptive Routing
The router classifies tasks by complexity (simple → moderate → complex → critical) and selects the cheapest capable model. Per-agent budgets prevent runaway costs. Circuit breakers isolate provider failures. EWMA-based latency tracking scores candidates by composite cost + latency.
Tier 1 (Simple): Haiku / GPT-4.1-mini / Flash → $0.25/M tokens
Tier 2 (Moderate): Sonnet / GPT-4.1 / Gemini Pro → $3/M tokens
Tier 3 (Complex): Opus / o3 / Gemini Ultra → $15/M tokens
Tier 4 (Critical): Opus with extended thinking → $15+/M tokens
Routing: classify(task) → candidates_for_tier() → sort by composite_score
composite = (cost_norm × 0.7) + (latency_norm × 0.3)
→ circuit_breaker.check() → acquire_permit() → provider.complete()
→ record_latency() → track_cost()Prompt Cache Optimization
3 cache breakpoints on Anthropic: system prompt, tool definitions, and conversation prefix. In a 10-round tool loop, all prior messages are cached at 0.1x input cost — ~78% savings.
Concurrency Limiter
Semaphore-based backpressure: 10 concurrent requests per provider, 20 globally. RAII permits auto-release. Prevents rate limiting when 10+ agents fire simultaneously.
Extended Thinking
Tier 4 (Critical) tasks use extended thinking on Anthropic and Bedrock providers. The router enables a dedicated thinking budget for complex reasoning. Signature echo-back ensures thinking block integrity across streaming responses.
Memory & Knowledge
Short-term, per-task context. Fast in-memory store cleared after task completion.
Historical records of past interactions and task outcomes. Postgres-backed with timestamps.
Vector embeddings for similarity search. pgvector with HNSW indexing for sub-50ms retrieval.
Learned patterns and procedures. How-to knowledge extracted from successful task completions.
Knowledge Graph
A persistent, cross-agent knowledge graph enables shared learning across the entire swarm. Nodes represent concepts, files, decisions, and patterns. Edges encode relationships with confidence scores that decay over time — stale knowledge naturally fades while frequently reinforced knowledge persists.
KnowledgeGraph
├─ extract(text) → Vec<KnowledgeNode> // NLP extraction
├─ store(node, confidence, tags) // With decay schedule
├─ query(context) → Vec<RelevantKnowledge> // Contextual retrieval
└─ build_context(task) → String // Inject into prompts
Confidence Decay: score *= e^(-λt) // Exponential decay
λ configurable per-knowledge-type
Reinforcement: score = min(1.0, score + boost)Tools & Sandbox
filesystemread_file, write_file, edit_file, list_dir, searchPath-scopedshellexecute commands with streaming outputLevel-gatedgitstatus, diff, commit, branch, log, checkoutWorkspace-scopedgithubcreate_pr, list_issues, review, commentToken-authedhttpGET, POST, PUT, DELETE with headersHost allowlistbrowsernavigate, screenshot, extract, clickIsolatedcode_actionexecute Python/shell scripts (CodeAct pattern)Level-gatedknowledgelearn, query, delete from knowledge graphWorkspace-scopedmanage_croncreate, list, enable, disable, delete cron jobsStandardset_remindercreate, list, cancel one-off remindersStandardswitch_workspacechange working directory mid-sessionPath-scopedStandard
File access scoped to workspace, full network, shell allowed with confirmation.
Strict
Workspace-scoped files, network restricted to allowlist, no shell without explicit approval.
Isolated
Temp-dir only, no network, no shell. For untrusted marketplace skills and WASM plugins.
Tool Result Cache
Cross-agent deduplication via DashMap. Cacheable tools: filesystem reads, git status/diff/log. TTL: 30s filesystem, 10s git. Auto-invalidation on write/edit/delete mutations.
Declarative Guardrails
Config-driven safety rules in magy.toml — no code required. Block dangerous shell commands via regex, flag git push for approval, enforce max token limits, disable specific tools. Compiled at startup with pre-compiled regex.
Scheduling
Magy supports both recurring cron jobs and one-off reminders. Jobs can be defined statically in magy.toml or created dynamically via the REST API and agent tools. The scheduler polls at a configurable interval, executing due jobs up to a concurrency limit.
[[cron]]
id = "daily-standup"
schedule = "0 0 9 * * *" # sec min hour day month weekday
agent = "aria"
task = "Run daily standup summary"
enabled = true
[scheduling]
poll_interval_secs = 30 # check frequency
max_concurrent = 10 # max parallel executionsCron Jobs
6-field cron expressions (second granularity). Sources: Config, API, or Agent tool. Jobs can be enabled/disabled without deletion. Execution log tracks history with status and duration.
Reminders
One-off delayed tasks with configurable delivery channels: Agent (message back), Telegram (group/topic), WebSocket (UI push), or Broadcast (all agents). Status tracking: Pending → Delivered/Failed/Cancelled.
Channels
Channels bridge external messaging platforms to Magy agents. Telegram is the primary supported channel, with per-agent topic routing, live streaming via message edits, and inline keyboard interactions.
[channels.telegram]
enabled = true
bot_token_env = "TELEGRAM_BOT_TOKEN"
group_id = -100123456789
default_agent = "aria" # agent for DMs
[channels.telegram.topics]
aria = 123 # topic ID per agent
nova = 456
[channels.telegram.streaming]
enabled = true
edit_interval_ms = 500
show_thinking = false
cross_post_agent_messages = trueBot Commands
/status — agent statuses, /cost — cost summary, /stop — cancel task, /model — switch LLM.
Topic Routing
Messages in Telegram group topics are routed to the mapped agent. DMs go to the default_agent. Cross-agent visibility via cross-posting.
Live Streaming
LLM responses stream in real-time by editing the Telegram message at configurable intervals. Thinking blocks can be shown or hidden.
Skills & Plugins
Skill System
Skills are composable, agent-invocable capabilities defined as YAML with embedded instructions. They can be bundled with the runtime or loaded from the marketplace. WASM skills execute in sandboxed Wasmtime environments.
# skills/code-review.yaml
name: code-review
description: "Review code changes"
tools: [git, filesystem]
steps:
- action: git_diff
- action: analyze
prompt: "Review these changes..."
- action: respondPlugin System
Plugins extend the runtime with custom tools, connectors, and event handlers. They implement the Plugin trait and are registered at startup. Hot-reload supported for development.
#[async_trait]
pub trait Plugin: Send + Sync + 'static {
fn name(&self) -> &str;
async fn init(&self, ctx: &PluginContext) -> Result<()>;
fn tools(&self) -> Vec<Box<dyn Tool>>;
async fn shutdown(&self) -> Result<()>;
}WASM Sandbox
Marketplace skills run in Wasmtime with configurable resource limits. Memory, CPU time, and filesystem access are all sandboxed.
Hot Reload
Skills and plugins can be reloaded without restarting the runtime. File watchers detect changes and trigger re-registration automatically.
Marketplace
Community-contributed skills distributed as WASM packages. Version-pinned, signature-verified, and sandbox-enforced for security.
Connectors
Send/receive emails, search, label management
Post messages, read channels, manage threads
Create/update issues, sprint management, transitions
Read/write pages, database queries, block manipulation
File upload/download, folder management, sharing
Generic receiver for external event sources
Connector Trait
#[async_trait]
pub trait Connector: Send + Sync + 'static {
fn name(&self) -> &str;
async fn authenticate(&self, credentials: &Credentials) -> Result<AuthToken>;
fn tools(&self) -> Vec<Box<dyn Tool>>; // Tools exposed to agents
fn subscriptions(&self) -> Vec<String>; // Event subjects
async fn on_event(&self, event: ConnectorEvent) -> Result<Vec<MagyMessage>>;
}Transport Layer
All inter-agent communication flows through NATS JetStream. Each agent has an inbox subject (agent.{id}.inbox), an outbox subject, and a stream subject for token-level output. Messages are durable — if an agent crashes, pending messages survive in the stream and are re-delivered on restart.
MagyMessage {
id: Ulid, // Globally unique, time-ordered
from: AgentId,
to: AgentId,
payload: Payload, // Data | Text | Stream | Signal
correlation_id: Option<Ulid>, // Link request → response
timestamp: DateTime<Utc>,
metadata: HashMap<String, Value>,
}
Subject patterns:
agent.{id}.inbox — incoming messages
agent.{id}.outbox — outgoing messages
agent.{id}.stream — token-by-token streaming
swarm.broadcast — all-agent announcements
task.{id}.status — task progress updatesTask Topologies
Direct (1:1), Pipeline (sequential chain), FanOut (parallel broadcast with result aggregation), Swarm (dynamic role-based collaboration). The scheduler picks topology based on task complexity.
Supervisor
AgentSupervisor monitors health, restarts crashed agents, manages lifecycle. Health checks are LLM-free — just "am I alive and functional?" with queue depth, error rate, and memory tracking.
API Reference
GET/api/healthHealth check with uptimeGET/api/agentsList all running agentsPOST/api/agentsCreate a new agentGET/api/agents/{id}Agent detailsPATCH/api/agents/{id}Update agent configDELETE/api/agents/{id}Remove agentPOST/api/agents/{id}/messageSend message to agentPOST/api/agents/{id}/stopGracefully stop agentPOST/api/agents/{id}/cancelCancel current taskPOST/api/agents/{id}/clear-contextReset conversation contextPOST/api/agents/{id}/restartRestart agentGET/api/agents/{id}/messagesAgent message historyGET/api/agents/{id}/sessionsAgent session listGET/api/agent-templatesList available agent templatesGET/api/tasksList tasksPOST/api/tasksCreate taskGET/api/tasks/{id}Get task detailsGET/api/tasks/{id}/tool-executionsTool execution logGET/api/tasks/{id}/diffsFile diffs from taskGET/api/costCost summary across all agentsGET/api/latencyPer-provider latency stats (EWMA)GET/api/skillsList installed skillsGET/api/dirsBrowse workspace directoriesGET/api/memory/statsMemory usage statsGET/api/memory/{agent_id}/episodicEpisodic memoriesGET/api/memory/{agent_id}/semanticSemantic memory entriesGET/api/memory/{agent_id}/proceduralProcedural knowledgeGET/api/knowledge/nodesList knowledge nodesGET/api/knowledge/nodes/{id}Get specific nodeGET/api/knowledge/edgesList knowledge edgesGET/api/knowledge/statsGraph statisticsGET/api/cronsList cron jobsPOST/api/cronsCreate cron jobGET/api/crons/{id}Get cron jobPATCH/api/crons/{id}Update cron jobDELETE/api/crons/{id}Delete cron jobGET/api/remindersList remindersPOST/api/remindersCreate reminderGET/api/reminders/{id}Get reminderDELETE/api/reminders/{id}Cancel reminderGET/api/schedule/logExecution logGET/api/eventsSSE event streamGET/api/wsWebSocket upgradeWebSocket Protocol
// Client → Server
{ "type": "subscribe", "agent_id": "aria" }
{ "type": "send_message", "agent_id": "aria", "content": "..." }
{ "type": "ping" }
// Server → Client
{ "type": "agent_message", "agent_id": "aria", "content": "..." }
{ "type": "agent_stream", "chunk": "...", "seq": 1, "done": false }
{ "type": "status_update", "agents": [...] }SSE Events
// Event types:
event: connected
data: {"session_id": "..."}
event: message:new
data: {"agent_id": "aria", "content": "..."}
event: agent:stream
data: {"agent_id": "aria", "chunk": "...", "seq": 1}
event: agent:status
data: {"id": "aria", "status": "running"}Configuration
All configuration lives in magy.toml. Copy from magy.toml.example to get started. API keys are referenced by environment variable name, never stored in the config file.
[magy]Project metadata[magy]
name = "my-project"
description = "Multi-agent orchestration"[[provider]]LLM provider credentials and settings[[provider]]
name = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-20250514", "claude-haiku-4-5-20251001"][infrastructure]Service URLs, ports, and limits[infrastructure]
nats_url = "nats://localhost:4222"
postgres_url = "postgresql://localhost/magy"
redis_url = "redis://localhost:6379"
api_port = 3011[[agent]]Agent definitions with model, tools, and personality[[agent]]
id = "aria"
name = "Aria"
model = "claude-sonnet-4-20250514"
system_prompt = "prompts/architect.md"
tools = ["filesystem", "shell", "git", "github"]
budget_cents_per_hour = 500[channels]Telegram bot and other chat integrations[channels]
telegram_enabled = true
telegram_token_env = "TELEGRAM_BOT_TOKEN"
telegram_default_agent = "aria"[[guardrail]]Declarative safety rules[[guardrail]]
tool = "shell"
action = "block"
pattern = "rm -rf /"
message = "Dangerous command blocked"[agent.personality]Agent personality traits and style[agent.personality]
traits = ["methodical", "detail-oriented"]
risk_tolerance = 0.3
collaboration_style = "lead"
verbosity = "concise"[scheduling]Cron and reminder scheduling[scheduling]
poll_interval_secs = 30
max_concurrent = 10
[[cron]]
id = "daily-standup"
schedule = "0 0 9 * * *"
agent = "aria"
task = "Run daily standup"[infrastructure.health]Health check and error thresholds[infrastructure.health]
check_interval_secs = 30
error_threshold = 5
restart_on_failure = truePerformance Optimizations
Tool Result Cache
Cross-agent deduplication with DashMap. Whitelisted read-only tools (filesystem reads, git status/diff/log) are cached with configurable TTLs. Auto-invalidation on mutations. Prevents N agents from running identical reads.
Latency-Aware Routing
EWMA-based per-provider latency tracking. Composite score = (cost × 0.7) + (latency × 0.3). Needs minimum 3 samples before influencing routing. Degrades gracefully to cost-only sorting.
Prompt Cache Optimization
3 Anthropic cache breakpoints: system prompt, tool definitions, conversation prefix. All prior messages cached at 0.1x input price in multi-round tool loops. Applied to both Anthropic and Bedrock providers.
Streaming Shell Output
Shell commands stream stdout/stderr line-by-line via mpsc channels instead of buffering until completion. Progress published in real-time through NATS to the UI.
Request Concurrency Limiter
Tokio semaphore-based backpressure: 10 per provider, 20 globally. RAII permits auto-release on drop. Prevents rate-limiting when multiple agents fire simultaneously.
Streaming Tool Dispatch
Tool calls dispatched as soon as their JSON arguments become parseable during LLM streaming — not after the full response. Overlaps tool execution with generation, saving 5-10s per round.
Key Innovations
Adaptive Task Classification
Automatically classifies task complexity (simple → critical) using heuristics on message length, tool count, and conversation depth. Routes to the cheapest capable model — no manual model selection.
Cross-Agent Knowledge Graph
Shared persistent knowledge with confidence-decay. Agents learn from each other. Knowledge reinforced by multiple agents persists; stale knowledge fades via exponential decay.
Speculative Tool Execution
Predicts next tool calls from historical patterns and pre-executes them. When the LLM response arrives, results are already cached — eliminating tool execution latency entirely.
WASM Skill Sandbox
Third-party skills execute in Wasmtime sandboxes with configurable resource limits. Marketplace skills can't escape their sandbox. Hot-reload without restarting the runtime.
Real-Time 3D Agent World
MagyVerse: a voxel-based 3D world (Three.js + R3F) where agents are visible characters. Watch agents collaborate, see their thought processes, interact via chat — all in real-time via WebSocket.
Streaming Tool Dispatch
Parse tool calls from LLM stream incrementally. Dispatch as soon as JSON arguments are complete — don't wait for the full response. Tool execution overlaps with generation.
Declarative Guardrails
Policy-as-config safety rules defined in magy.toml. Block dangerous commands, flag sensitive operations for approval, enforce token limits — all without writing code. Regex patterns compiled at startup for zero runtime overhead.
Competitor Matrix
| Feature | Magy | OpenClaw | CrewAI | LangGraph | AutoGen | OpenAI SDK |
|---|---|---|---|---|---|---|
| Language | Rust | TypeScript (Node.js) | Python | Python | Python | Python |
| Concurrency Model | Tokio async (sub-μs tasks) | Multi-process (Node.js) | Threading | Async Python | Threading | Async Python |
| Agent Topology | Direct/Pipeline/FanOut/Swarm | Lobster YAML pipelines | Sequential/Hierarchical | Graph (DAG + cycles) | Conversational | Handoffs |
| LLM Providers | 8 (Anthropic, OpenAI, Gemini, Bedrock, Ollama, DeepSeek, Groq, Kimi) | 10+ (OpenAI, Anthropic, Gemini, Groq, xAI, Mistral, Ollama, Kimi, ...) | Via LiteLLM | Via LangChain | OpenAI + Anthropic | OpenAI only |
| Cost-Aware Routing | Adaptive tiering + EWMA latency | Key rotation only | No | No | No | No |
| Memory System | 4-tier (Working/Episodic/Semantic/Procedural) | Flat files (MEMORY.md + JSONL) | Short + Long term | Checkpointed state | Conversation history | Sessions |
| Knowledge Graph | Built-in with confidence decay | Plugin only (LanceDB) | No | No | No | No |
| Tool Sandbox | 3-level (Standard/Strict/Isolated) | Binary (on/off) | No sandbox | No sandbox | Docker optional | No sandbox |
| Streaming Tool Dispatch | Yes (parse-as-you-stream) | No | No | No | No | No |
| Prompt Caching | 3 breakpoints, ~78% savings | No | No | No | No | Automatic |
| Plugin System | WASM sandbox + Registry | Skills + ClawHub marketplace | Tools only | Tools only | Tools only | Tools + MCP |
| 3D Visualization | MagyVerse (Three.js) | Canvas/A2UI (2D) | No | LangSmith (2D) | AutoGen Studio | No |
| Transport | NATS JetStream (durable) | WebSocket loopback (single-node) | In-process | In-process | In-process | In-process |
| Built-in Scheduling | Cron + Reminders | Cron + webhooks | No | No | No | No |
| Declarative Guardrails | Config-driven (magy.toml) | Tool policy config | No | No | No | SDK guardrails |
| Channel Integrations | Telegram | 13+ (WhatsApp, Slack, Discord, Signal, ...) | No | No | No | No |
| Open Source | Yes | Yes (MIT, 239k stars) | Yes (paid cloud) | Yes (paid cloud) | Yes | Yes |
Where Magy Excels
Memory architecture (4-tier with semantic search), knowledge graph with confidence decay, cost-aware LLM routing, 3-level tool sandbox, streaming tool dispatch, multi-agent topologies (Direct/Pipeline/FanOut/Swarm), and production-grade transport via NATS JetStream.
Where OpenClaw Excels
Channel integrations (13+ including WhatsApp, Slack, Discord, Signal), community and ecosystem (239k stars), personal assistant UX, platform coverage, ease of setup (zero infrastructure), and plugin ecosystem via ClawHub marketplace.
Different design goals — Magy targets production multi-agent orchestration; OpenClaw targets personal AI assistants. Data compiled from framework documentation, GitHub repositories, and independent benchmarks (Feb 2026).
Tech Stack
Rust Backend
tokioAsync runtimeasync-natsNATS JetStream clientaxumHTTP/WebSocket/SSE serversqlx + pgvectorPostgres with vector searchredisCache layerreqwestHTTP client for LLM providersserdeSerialization frameworkdashmapConcurrent hash mapswasmtimeWASM skill executiontracingStructured loggingteloxideTelegram bot frameworkhandlebarsPrompt templatingTypeScript Frontend
Next.js 16App Router frameworkReact 19UI renderingThree.js + R3F3D voxel world@react-three/drei3D helpers & primitives@react-three/rapierPhysics engineZustand 5State managementMotionAnimationsTailwind CSS 4Styling