001. Platform vs Runtime Boundary

Status: Accepted

Date: 2025-03-15

Context

When building a multi-agent orchestration system, the central design question is: what does the platform own, and what do agent runtimes own? Agent runtimes like OpenClaw, Mastra, and direct Anthropic SDK usage each bring their own strengths -- skills, MCP servers, conversation memory, model switching, thinking levels. Building a platform that duplicates these capabilities would be wasteful and always lag behind the runtimes themselves.

At the same time, runtimes are fundamentally single-agent systems. They have no concept of multi-agent coordination, external channel connectivity, deployment infrastructure, governance, or billing. These are platform-level concerns that no runtime will ever provide.

Decision

ArchAgent is an orchestration platform, not an agent runtime. It sits above agent runtimes and provides the infrastructure they cannot.

Runtimes own execution -- inference, tools/skills, reasoning, model switching, conversation state.

ArchAgent owns everything around execution:

ArchAgent (Platform)	Runtimes (OpenClaw/Mastra/Custom)
Multi-agent topology and coordination	Single-agent inference
Channel adapters (Slack, Discord, Telegram, WhatsApp)	--
Integration proxy (policy-enforced API access)	Skills/MCP for tool access
Deployment and scaling (Cloud Run)	--
Task management and work queues	--
Persistent agent memory (5 types)	Conversation memory (Mastra only)
Approval workflow	Exec approvals (OpenClaw only)
Activity logging and audit trails	--
Copilot management (AI CLI)	--
Billing, auth, team management	--
Shared context channels	--

The guiding principle: do not build features that runtimes already provide. OpenClaw has cron/scheduling -- do not build scheduling. Mastra has MCP -- do not build a tool registry. Instead, expose runtime capabilities through the platform (copilot tools, API, UI) and focus on what only the platform can do: multi-agent coordination, external connectivity, and governance.

How Each Layer Communicates

The bridge container runs inside Cloud Run. It starts agent loops per runtime type. Each loop communicates with its runtime differently:

OpenClaw: HTTP to a singleton gateway child process (POST /v1/responses)
Mastra: HTTP to a scaffolded Mastra server child process
Custom: Direct Anthropic SDK calls with streaming

All three runtimes read and write through Firestore -- messages arrive as agents/{id}/messages docs, and responses are written back to the same docs. Channel adapters are runtime-agnostic: they write to Firestore, and whichever runtime loop is running picks up the message.

Alternatives Considered

Alternative	Pros	Cons	Why rejected
Monolithic agent platform -- build our own runtime with inference, tools, skills, conversation memory	Full control, single codebase, no integration complexity	Massive scope, always behind dedicated runtimes on features, vendor lock-in for users	Runtimes evolve faster than we could replicate. OpenClaw ships skills weekly, Mastra ships MCP integrations. We would be permanently behind.
Runtime-only -- build a better runtime, no platform infrastructure	Simpler architecture, focused scope	No multi-agent coordination, no channel connectivity, no deployment/scaling, no governance	The hard problems are platform problems (topology, channels, integrations), not inference problems. Runtimes already solve inference well.
Plugin architecture -- single runtime with plugins for different model providers	Extensible, familiar pattern	Still single-agent, plugins are shallow adapters, no real runtime diversity	Different runtimes have fundamentally different architectures (OC uses SOUL.md + skills, Mastra uses MCP + threads). Plugins cannot capture these differences.
Runtime abstraction layer -- unified API that wraps all runtimes identically	Clean API, easy to swap runtimes	Lowest-common-denominator problem, cannot expose unique runtime features (OC thinking levels, Mastra MCP, Custom shell access)	Abstraction would erase the value of each runtime. Users choose OpenClaw for skills, Mastra for MCP -- hiding these defeats the purpose.

Consequences

Positive

Each runtime can evolve independently -- new OC skills, Mastra MCP servers, and Custom tools are available immediately without platform changes
The platform focuses on genuinely hard coordination problems (topology, channels, integration proxy) rather than reimplementing inference
Users can mix runtimes within a single workspace (e.g., OpenClaw agent for coding, Custom agent for research, both coordinating via shared context)
Channel adapters work with all runtimes identically -- no per-runtime channel code

Negative

Three runtime integration points to maintain (OC gateway, Mastra scaffold, Custom loop)
Runtime-specific copilot tools (7 for OC, 4 for Mastra) must be kept in sync with runtime updates
Users must understand which runtime to choose for their use case
Some features exist in both layers with slight overlap (OC exec approvals vs platform approval workflow, Mastra conversation memory vs platform agent memory)

Neutral

The bridge container must manage multiple child processes (OC gateway, Mastra server) alongside agent loops
Firestore serves as the universal message bus between platform and all runtimes

001. Platform vs Runtime Boundary ​

Context ​

Decision ​

How Each Layer Communicates ​

Alternatives Considered ​

Consequences ​

Positive ​

Negative ​

Neutral ​