In Development · Open Source · Local-first

I'll be back. And I route tokens properly.

A local-first agent architecture. Query classification, retrieval orchestration and skill execution β€” controllable and token-efficient. Runs on your hardware, escalates to cloud models on demand.

Get updates → GitHub · Coming Soon
πŸ”’ Local-first β€” your data stays on your device
🎯 Token-efficient β€” only relevant context enters the prompt
βš–οΈ Provider-agnostic β€” local, private cloud, frontier model

What is Clawminator?

An open-source project in active development. Not yet publicly available.

Clawminator is an agent architecture, not a chatbot. The core is token efficiency: requests are first classified, then prepared through routing, memory and retrieval, and only enriched with context that is actually relevant. Smart and small, not dumb and big. The result: an agent that works well on 16K context and 9B parameters β€” or uses the same pipeline with a frontier cloud model. Provider-agnostic: the orchestrator stays the same, local or cloud.

ArchitectureQuery Classifier β†’ IR Orchestrator β†’ 7-Layer Router β†’ Skill/LLM Execution
Local Reference HWMac Mini M4, 16 GB RAM
Local Modelqwen3.5:9b (16K context)
Cloud OptionMistral Small (EU servers, GDPR-compliant, 128K context) β€” more providers planned
InterfacesTelegram (current) Β· Web / API / MS Teams planned
LanguagesGerman + English
LicenseMIT (core) β€” enterprise integration on request

75 Skills β€” executable actions without LLM generation

Once the Query Classifier detects a skill action, the Router calls the matching skill function directly. No token generation, no hallucination. The LLM fallback only kicks in if no skill matches.

πŸ“

Files

Create, read, write, search, delete

🌀️

Weather

Current + forecast via Open-Meteo (no API key)

⏱️

Timer & Reminders

Cron-based create, list, delete

🧠

Memory

Remember, retrieve, search facts. Knowledge Graph (Clawminator) + full-text search (OpenClaw)

πŸ’»

System

CPU, RAM, disk, processes, network, WiFi password

πŸ”§

Gateway

Status, restart, view config

🌐

Browser

Open URLs, navigate, list/close tabs, screenshots

🎨

Canvas

Display HTML, hide, execute JS, snapshots

πŸ“

Apple Notes & Reminders

Create, search, list

🎡

Spotify

Play, pause, skip (if spotify_player installed)

πŸ’‘

Philips Hue

Light control (if openhue installed)

πŸ’¬

iMessage

Send messages via AppleScript

πŸ“¨

Telegram

Reactions, replies, polls

πŸ”„

Sessions

Status, history, manage sub-agents

πŸ“±

Nodes / Devices

Camera, location, battery, notifications

πŸ“Š

Health Check

Ollama + Gateway + Disk + RAM at a glance

13 Slash Commands β€” 0ms, no LLM

/skyclawWeather
/cyberclawSystem
/recallMemory
/monitorMonitoring
/terminateReset
/statusStatus
/missionHelp
/diagnoseDebug
/profileProfile
/clawbebackReboot
/targetsTimer/Cron
/resistanceError Log
/claw800Sessions/Agents

52 SKILL.md Templates β€” for creative tasks

When the language model needs to think (poems, explanations, code), the matching skill template is loaded via FTS5/BM25 (~2K tokens). The rest of the 16K budget stays available for chat.

Three-layer architecture

Every message passes through three layers β€” the first one that matches, responds.

1

Slash Commands < 100ms 0 Tokens

Instant, deterministic, no model woken up.

/skyclaw Β· /cyberclaw Β· /recall Β· /targets Β· /claw800 ...
2

Classifier + Router + Skill Execute ~50ms 0 meaningful tokens

The Query Classifier (multi-head neural network) detects intent, domain and task type. The 7-Layer Router picks the matching skill function. Executed directly β€” no LLM generation, no hallucination.

"What's the weather?" Β· "Timer 30 minutes" Β· "Take a photo"
3

LLM + Context Pipeline 2–15s 500–4,000 Tokens

Everything that requires thinking β€” explanations, analysis, code, creative writing β€” goes through the full context pipeline to the local model (or optionally to cloud).

Bridge-Tool + Capability Injection + Context Engine + Context Guard
Why three layers? On 16K context tokens are scarce. Layer 1 costs nothing. Layer 2 produces executable actions without letting the LLM generate output β€” no hallucination, no token cost for the answer. Layer 3 is reserved for true language tasks.
You: What's the weather tomorrow?
β†’ Layer 2 classifier+router+weather 280ms Β· 0 output tokens
🌀️ Vienna: 22°C, partly cloudy, no rain.

You: Explain monads in Haskell
β†’ Layer 3 qwen3.5:9b HEAVY Β· 8.2s Β· 2.1K Tokens
A monad is a structure that sequences...

You: /cyberclaw
β†’ Layer 1 slash <1ms Β· 0 Tokens
CPU: 12% Β· RAM: 8.2/16 GB Β· Disk: 142 GB free

Under the hood β€” details for developers

Expand if you want the full picture.

🎯 Query Classifier β€” Multi-Head Neural Network, 4 parallel classifications

Before anything else happens, Clawminator classifies the request. A shared embedding (nomic-embed-text, 768 dimensions) is computed once and consumed by four parallel heads:

HeadClassesPurpose
intent-head3actionable / conversational / ambiguous
domain-head17weather / calendar / email / ... (domain routing)
task-type-head8 (multi-label)small_talk, skill_action, knowledge_personal, knowledge_general, safety_critical, meta_question, follow_up, multi_action
segmentation-headToken-level BIOSplits multi-action queries into atomic tasks

Downstream: Anaphora Resolver (layers A–D local, layer E optional cloud) resolves "it", "there", "that" against the dialog state. Dependency Detector (deixis + lastResult slot) detects references to previous answers. Output: a query plan with a list of classified segments.

Why multi-head instead of separate classifiers per dimension? One embedding, four decisions, one forward pass. Saves time and memory β€” critical with local models.
πŸ—‚οΈ IR Orchestrator β€” Reciprocal Rank Fusion over 4 stores

Information retrieval is not "query one vector DB". Clawminator orchestrates four parallel sources policy-driven β€” how strongly each source is weighted depends on the task type.

StoreContentRetrieval
Knowledge GraphEntity gazetteer + factsLookup + FTS fallback
Memory Chunks (Session)Dialog historyBM25 + sqlite-vec cosine
Memory Chunks (Workspace)SOUL / IDENTITY / USER.mdPersistent profile data
Document StoreUser uploads: PDF/TXT/MD/DOCXChunk-based indexing

RRF Fusion (Reciprocal Rank Fusion): Results from all sources are fused via a policy matrix. Example: for knowledge_personal, the Knowledge Graph weighs more; for knowledge_general, the Document Store does. No static merging β€” the weights come from the query classifier output.

The result flows into the Layer 2 Router: a ranked context ready for skill matching or LLM injection.
🎚️ 7-Layer Router β€” ADR-021, policy-driven skill matching

The router runs through seven layers in fixed order. Each layer can match and answer directly, match and forward, or not match at all.

LayerFunction
2.0 Slash/Regex SupportExplicit commands and regex skills
2.1 Entity GazetteerKnown entities from the KG
2.2 KG-First (fact queries)For knowledge_personal β€” KG lookup first
2.3 Dialog-State-PriorContext from the previous answer
2.4 Hybrid BM25 + Dense + RRFCORE β€” fusion of text and vector search
2.5 MLP Domain GateBoost for domain-specific skills (no block)
2.6 Cross-Encoder Re-rankOptional β€” expensive, but precise

After the router: Confidence Gate (ADR-025). HIGH (gap > 0.02) β†’ execute skill. MEDIUM (0.005–0.02) β†’ Intent-LLM (qwen3.5:9b local, 1-token answer, 200–500ms) clarifies. LOW (< 0.005) β†’ forward to Layer 3 LLM.

This is not an if-else cascade β€” every layer is configurable, weights come from the active hardware profile, and the policy matrix controls which layers are active per task type.
🧩 Layer 3 LLM Pipeline β€” Bridge-Tool, Capability Injection, Context Guard

When a request really does need the LLM, it doesn't just go in as a prompt. Four stations before it keep the context small and relevant:

StationFunction
Bridge-Tool execute_action~650 token budget. Unified entry point for LLM-driven skill calls. Replaces uploading all 26 tool definitions.
Capability InjectionOnly the relevant capabilities for the current request are injected. Instead of 8,000 tokens for all tools: 400–800 for the right ones.
Context EngineTwo-slot system: systemMessages[0] static (KV-cache friendly), systemMessages[1] dynamic with IR chunks from the orchestrator.
Context GuardRule-based pruning at 70% context fill. Cleans up before the model can hallucinate because the context got too full.
The goal: even on a 16K model, there's enough budget left for a sensible answer. No "dump everything into context and hope".
πŸ›‘οΈ Quality Gate & Mistral Improve β€” Output check, async fallback

After the LLM response, a rule-based Quality Gate checks the output locally β€” no additional LLM call. It looks for typical failure patterns:

  • too_short β€” response truncated or empty
  • repetition β€” model repeats itself (infinite loop)
  • placeholder β€” "[insert answer here]" or similar
  • refusal β€” unwanted "I can't do that"
  • lang_mix β€” wrong language or mixed

Mistral Improve (async Fire-and-Forget): When cloud=on and the Quality Gate flags a FAIL, the response is sent to Mistral Small in the background for an improved version. The first (local) answer goes out immediately β€” no extra waiting for the user. The improvement arrives as a follow-up marked with a ☁️ icon.

Best of both worlds: speed of the local model, cloud quality when needed. And the user honestly sees where the answer comes from.
⚑ Complexity Router β€” one model, four tiers, optional cloud

Not every request needs a full generation budget. The router analyzes the complexity of each message heuristically (no LLM call, <5ms) and automatically selects token budget and execution target.

Local model: qwen3.5:9b runs on Mac Mini M4 with 16 GB RAM. The trick: it's not the model that has to be small, it's the context. Through classifier + retrieval only the truly relevant context enters β€” 16K tokens are enough.

Four complexity tiers:

TierMax TokensTargetExample
SUPER_LIGHT256Local"Hello", "Thanks"
LIGHT512LocalSimple questions
MEDIUM2,048Local or cloud (if cloud=on)Explanations, summaries
HEAVY3,072Local or cloud (if cloud=on)Code generation, analysis

Cloud option: Currently integrated is Mistral Small with 128K context. Chosen for EU servers and GDPR compliance. More providers are planned β€” routing stays identical regardless of which model ends up receiving the prompt.

Routing is heuristic via a ComplexityAnalyzer β€” no LLM call, no extra tokens, under 5ms.
πŸ–₯️ Hardware profile system β€” current reference + planned profiles

Hardware profiles as JSON files with schema validation. Each profile defines model selection, context budget, memory parameters, timeouts and Ollama tuning.

Currently tested:

ProfileRAMModelContextStatus
reference16 GBqwen3.5:9b16KMac Mini M4 β€” operational
tiny / medium / large4–64+ GBβ€”β€”planned

What each profile will control: model configuration, context budget with ratios (system prompt 8–20%, memory 10–40%, history 37–45%, tools 10%), memory parameters (maxResults, minScore, fusion weights), stability timeouts and Ollama-specific tuning (keepAlive, flashAttention, kvCacheType).

🧠 Memory & Knowledge Graph β€” Stanford Generative Agents scoring

OpenClaw provides the base: SQLite with FTS5 full-text search and vector embeddings. That works.

Clawminator adds: A Knowledge Graph as SQLite triple store. Two tables (entities, relations) in memory.db. Traversal via recursive CTEs β€” up to 3 hops. Temporal weighting and confidence-based extraction via regex + LLM.

Retrieval scoring (based on Park et al., 2023):

score = 0.35 Γ— weight + 0.35 Γ— confidence + 0.30 Γ— recency

FactorMechanism
Weight+0.15 per repetition (Ebbinghaus-inspired)
ConfidencePer source: user self-report 0.95, LLM-extracted 0.50, system seed 0.40
Recency7-day exponential decay

Temporal Contradiction Detection: Single-valued predicates automatically invalidate outdated facts ("lives in Vienna" replaces "lives in Berlin").

Cross-system index: KG facts are synchronized into full-text search.

Unlike the Stanford paper: heuristic importance scoring instead of LLM-based β€” optimized for local models with limited context. LLM-based scoring (1–10 scale) would be too expensive and unreliable in this setting.

Hybrid retrieval (OpenClaw): BM25 + vector cosine fusion, configurable per profile.

Memory lifecycle: Age-based expiry (default 90 days) + count-based trim. WAL mode prevents locking issues. Seed protection: seedFromWorkspace() doesn't overwrite user-set values on restart.

Anti-hallucination: Extracted entities must appear verbatim in the user message. No phantom connections.

πŸ”€ Fork strategy β€” plugin-first, minimal core patches

Clawminator uses a hybrid approach: β‰₯90% plugin code, ≀10% core patches. OpenClaw is included as a git subtree with automated weekly sync via GitHub Actions.

This means: upstream updates from OpenClaw flow in regularly without breaking Clawminator code. The few core patches are clearly documented and isolated.

Why plugin instead of hard fork? A hard fork would be easier short-term but a maintenance nightmare long-term. The plugin system keeps upgrade paths open.
πŸ§ͺ Testing β€” 655 automated + 170 manual tests

Clawminator follows an ASPICE-adapted requirements process with formal IDs, MoSCoW priorities and verification methods.

CategoryCountCoverage
Automated655Skills, router, memory, profiles, gateway
Manual170Telegram integration, macOS features, E2E
Total825All layers + integrations

The manual test suite covers everything automation can't: Telegram messages, Spotify control, iMessage, camera triggers, Apple Notes β€” real macOS interactions that need a running desktop.

πŸ’“ Health Monitor β€” real-time dashboard in browser

Clawminator provides a built-in health endpoint at /clawminator/health that shows the overall system state at a glance β€” directly in the browser, no extra tools needed.

What the health monitor checks:

CheckWhat's verified
OllamaReachability, loaded models, VRAM usage
GatewayProcess status, uptime, active sessions
MemorySQLite state, Knowledge Graph size, FTS5 index
DiskFree space, model directory
RAMSystem memory, swap usage
TelegramBot connection, last message
The endpoint runs locally on port 18789 β€” only accessible on localhost, no external access. No authentication needed because nothing leaves the device.

Why local instead of cloud?

Five reasons. No marketing. No problemo.

πŸ”’ Privacy

Everything stays on your device. No telemetry, no tracking, no data shared with third parties.

πŸ’° Local-first, cloud-optional

The agent runs locally at zero cost. 90% of your requests are handled on-device β€” offline, in milliseconds, free.

πŸ“‘ Always available

Works without internet (except weather + web search). Your assistant is never offline.

⚑ Fast

Skills respond in 50ms, LLM in 2–15 seconds. No waiting for cloud latency.

🧠 Orchestrator mode

Too complex for the local model? Clawminator can delegate tasks to Claude Code, Codex, Gemini or other CLI agents β€” directly from the server, no API keys to configure in Clawminator. You decide when and whether external help is involved.

Built on OpenClaw

OpenClaw is the framework. Clawminator is a specialization for local hardware.

OpenClaw is a generic open-source AI gateway β€” it supports cloud LLMs and local models (via Ollama) equally. 26 native tools, 53 bundled skills, 13,700+ community skills on ClawHub. Clawminator is a specialized configuration for a specific use case: local models with 16K context on consumer hardware, where every token counts.

OpenClawClawminator
ApproachGeneric framework for all model sizesSpecialized for local models, consumer HW
Context WindowTypically 128K+ tokens16K tokens (every token budgeted)
LLM Tools26 native (Clawminator uses these in layer 3)+ 13 own slash commands (0 tokens)
Deterministic Skillsβ€”75 skills via classifier+router (0 tokens, ~50ms)
Model Routing1 model configurableLocal + cloud option, complexity analyzer (<5ms)
Knowledge Graphβ€”SQLite triple store with Stanford scoring
LanguagesEnglishGerman + English
Target audienceDevelopers, power users, all model sizesToken-efficient setups, 16K context range

OpenClaw is a generic framework that works with all model sizes. Clawminator specializes in the 16K context range and adds its own layers: a multi-head Query Classifier, 75 deterministic skills, a 7-Layer Router, Quality Gate with async improve loop, and a Knowledge Graph with Stanford scoring.

Mission Timeline

What exists β€” and what's coming.

βœ…
Today β€” available
75 Skills, 13 Commands, 52 SKILL.md templates, Knowledge Graph, 2 LLMs, Telegram Bot, German + English, three-layer architecture
πŸ”„
In progress
One-click installer with automatic system language detection
πŸ“‹
Planned
Additional hardware profiles (tiny/medium/large) for different RAM configurations, Windows & Linux support, more interfaces (Web, API, MS Teams), integration adapters for enterprise channels

⚠️ Planned features do not exist yet β€” they are intended for future versions.

License & Collaboration

Open source core. Enterprise integration on request.

πŸ“– MIT License β€” Core

The full architecture will be publicly available under MIT license once the test suite is complete: query classifier, IR orchestrator, 7-layer router, memory system with Stanford scoring, CLI tooling, all ADRs. Use it, fork it, build on it.

πŸ”§ Available separately

Trained classifier weights, enterprise connectors (MS Teams, SharePoint, M365, SAP), deployment automation and domain-specific fine-tunings are not released as open source. These parts emerge from projects with companies that need them.

πŸ’¬ Open for conversations

I'm a requirements engineer and software system designer with an automotive background (ASPICE, Bosch since 2017). Clawminator is the combination of both: RE discipline applied to AI agents. Open to consulting, integration projects, and the right full-time role.

Clawminator is in development. Stay tuned. 🦞

The project is not yet publicly available. Sign up and we'll notify you about progress, beta access and release. No spam β€” only real updates.

βœ“ Almost there! We've sent you a confirmation email. Please click the link inside. Check your spam folder too β€” new domains sometimes land there. 🦞