Skip to content

001. Platform vs Runtime Boundary

Status: Accepted

Date: 2025-03-15

Context

When building a multi-agent orchestration system, the central design question is: what does the platform own, and what do agent runtimes own? Agent runtimes like OpenClaw, Mastra, and direct Anthropic SDK usage each bring their own strengths -- skills, MCP servers, conversation memory, model switching, thinking levels. Building a platform that duplicates these capabilities would be wasteful and always lag behind the runtimes themselves.

At the same time, runtimes are fundamentally single-agent systems. They have no concept of multi-agent coordination, external channel connectivity, deployment infrastructure, governance, or billing. These are platform-level concerns that no runtime will ever provide.

Decision

ArchAgent is an orchestration platform, not an agent runtime. It sits above agent runtimes and provides the infrastructure they cannot.

Runtimes own execution -- inference, tools/skills, reasoning, model switching, conversation state.

ArchAgent owns everything around execution:

ArchAgent (Platform)Runtimes (OpenClaw/Mastra/Custom)
Multi-agent topology and coordinationSingle-agent inference
Channel adapters (Slack, Discord, Telegram, WhatsApp)--
Integration proxy (policy-enforced API access)Skills/MCP for tool access
Deployment and scaling (Cloud Run)--
Task management and work queues--
Persistent agent memory (5 types)Conversation memory (Mastra only)
Approval workflowExec approvals (OpenClaw only)
Activity logging and audit trails--
Copilot management (AI CLI)--
Billing, auth, team management--
Shared context channels--

The guiding principle: do not build features that runtimes already provide. OpenClaw has cron/scheduling -- do not build scheduling. Mastra has MCP -- do not build a tool registry. Instead, expose runtime capabilities through the platform (copilot tools, API, UI) and focus on what only the platform can do: multi-agent coordination, external connectivity, and governance.

How Each Layer Communicates

The bridge container runs inside Cloud Run. It starts agent loops per runtime type. Each loop communicates with its runtime differently:

  • OpenClaw: HTTP to a singleton gateway child process (POST /v1/responses)
  • Mastra: HTTP to a scaffolded Mastra server child process
  • Custom: Direct Anthropic SDK calls with streaming

All three runtimes read and write through Firestore -- messages arrive as agents/{id}/messages docs, and responses are written back to the same docs. Channel adapters are runtime-agnostic: they write to Firestore, and whichever runtime loop is running picks up the message.

Alternatives Considered

AlternativeProsConsWhy rejected
Monolithic agent platform -- build our own runtime with inference, tools, skills, conversation memoryFull control, single codebase, no integration complexityMassive scope, always behind dedicated runtimes on features, vendor lock-in for usersRuntimes evolve faster than we could replicate. OpenClaw ships skills weekly, Mastra ships MCP integrations. We would be permanently behind.
Runtime-only -- build a better runtime, no platform infrastructureSimpler architecture, focused scopeNo multi-agent coordination, no channel connectivity, no deployment/scaling, no governanceThe hard problems are platform problems (topology, channels, integrations), not inference problems. Runtimes already solve inference well.
Plugin architecture -- single runtime with plugins for different model providersExtensible, familiar patternStill single-agent, plugins are shallow adapters, no real runtime diversityDifferent runtimes have fundamentally different architectures (OC uses SOUL.md + skills, Mastra uses MCP + threads). Plugins cannot capture these differences.
Runtime abstraction layer -- unified API that wraps all runtimes identicallyClean API, easy to swap runtimesLowest-common-denominator problem, cannot expose unique runtime features (OC thinking levels, Mastra MCP, Custom shell access)Abstraction would erase the value of each runtime. Users choose OpenClaw for skills, Mastra for MCP -- hiding these defeats the purpose.

Consequences

Positive

  • Each runtime can evolve independently -- new OC skills, Mastra MCP servers, and Custom tools are available immediately without platform changes
  • The platform focuses on genuinely hard coordination problems (topology, channels, integration proxy) rather than reimplementing inference
  • Users can mix runtimes within a single workspace (e.g., OpenClaw agent for coding, Custom agent for research, both coordinating via shared context)
  • Channel adapters work with all runtimes identically -- no per-runtime channel code

Negative

  • Three runtime integration points to maintain (OC gateway, Mastra scaffold, Custom loop)
  • Runtime-specific copilot tools (7 for OC, 4 for Mastra) must be kept in sync with runtime updates
  • Users must understand which runtime to choose for their use case
  • Some features exist in both layers with slight overlap (OC exec approvals vs platform approval workflow, Mastra conversation memory vs platform agent memory)

Neutral

  • The bridge container must manage multiple child processes (OC gateway, Mastra server) alongside agent loops
  • Firestore serves as the universal message bus between platform and all runtimes