In Development · Open Source · Local-first

I'll be back. And I route tokens properly.

A local-first agent architecture. Query classification, retrieval orchestration and skill execution — controllable and token-efficient. Runs on your hardware, escalates to cloud models on demand.

Get updates → GitHub · Coming Soon

🔒 Local-first — your data stays on your device

🎯 Token-efficient — only relevant context enters the prompt

⚖️ Provider-agnostic — local, private cloud, frontier model

What is Clawminator?

An open-source project in active development. Not yet publicly available.

Clawminator is an agent architecture, not a chatbot. The core is token efficiency: requests are first classified, then prepared through routing, memory and retrieval, and only enriched with context that is actually relevant. Smart and small, not dumb and big. The result: an agent that works well on 16K context and 9B parameters — or uses the same pipeline with a frontier cloud model. Provider-agnostic: the orchestrator stays the same, local or cloud.

Architecture	Query Classifier → IR Orchestrator → 7-Layer Router → Skill/LLM Execution
Local Reference HW	Mac Mini M4, 16 GB RAM
Local Model	qwen3.5:9b (16K context)
Cloud Option	Mistral Small (EU servers, GDPR-compliant, 128K context) — more providers planned
Interfaces	Telegram (current) · Web / API / MS Teams planned
Languages	German + English
License	MIT (core) — enterprise integration on request

75 Skills — executable actions without LLM generation

Once the Query Classifier detects a skill action, the Router calls the matching skill function directly. No token generation, no hallucination. The LLM fallback only kicks in if no skill matches.

📁

Files

Create, read, write, search, delete

🌤️

Weather

Current + forecast via Open-Meteo (no API key)

⏱️

Timer & Reminders

Cron-based create, list, delete

🧠

Memory

Remember, retrieve, search facts. Knowledge Graph (Clawminator) + full-text search (OpenClaw)

💻

System

CPU, RAM, disk, processes, network, WiFi password

🔧

Gateway

Status, restart, view config

🌐

Browser

Open URLs, navigate, list/close tabs, screenshots

🎨

Canvas

Display HTML, hide, execute JS, snapshots

📝

Apple Notes & Reminders

Create, search, list

🎵

Spotify

Play, pause, skip (if spotify_player installed)

💡

Philips Hue

Light control (if openhue installed)

💬

iMessage

Send messages via AppleScript

📨

Reactions, replies, polls

🔄

Sessions

Status, history, manage sub-agents

📱

Nodes / Devices

Camera, location, battery, notifications

📊

Health Check

Ollama + Gateway + Disk + RAM at a glance

13 Slash Commands — 0ms, no LLM

/skyclawWeather

/cyberclawSystem

/recallMemory

/monitorMonitoring

/terminateReset

/statusStatus

/missionHelp

/diagnoseDebug

/profileProfile

/clawbebackReboot

/targetsTimer/Cron

/resistanceError Log

/claw800Sessions/Agents

52 SKILL.md Templates — for creative tasks

When the language model needs to think (poems, explanations, code), the matching skill template is loaded via FTS5/BM25 (~2K tokens). The rest of the 16K budget stays available for chat.

Three-layer architecture

Every message passes through three layers — the first one that matches, responds.

Slash Commands < 100ms 0 Tokens

Instant, deterministic, no model woken up.

/skyclaw · /cyberclaw · /recall · /targets · /claw800 ...

Classifier + Router + Skill Execute ~50ms 0 meaningful tokens

The Query Classifier (multi-head neural network) detects intent, domain and task type. The 7-Layer Router picks the matching skill function. Executed directly — no LLM generation, no hallucination.

"What's the weather?" · "Timer 30 minutes" · "Take a photo"

LLM + Context Pipeline 2–15s 500–4,000 Tokens

Everything that requires thinking — explanations, analysis, code, creative writing — goes through the full context pipeline to the local model (or optionally to cloud).

Bridge-Tool + Capability Injection + Context Engine + Context Guard

Why three layers? On 16K context tokens are scarce. Layer 1 costs nothing. Layer 2 produces executable actions without letting the LLM generate output — no hallucination, no token cost for the answer. Layer 3 is reserved for true language tasks.

You: What's the weather tomorrow?
→ Layer 2 classifier+router+weather 280ms · 0 output tokens
🌤️ Vienna: 22°C, partly cloudy, no rain.

You: Explain monads in Haskell
→ Layer 3 qwen3.5:9b HEAVY · 8.2s · 2.1K Tokens
A monad is a structure that sequences...

You: /cyberclaw
→ Layer 1 slash <1ms · 0 Tokens
CPU: 12% · RAM: 8.2/16 GB · Disk: 142 GB free

Under the hood — details for developers

Expand if you want the full picture.

🎯 Query Classifier — Multi-Head Neural Network, 4 parallel classifications

Before anything else happens, Clawminator classifies the request. A shared embedding (nomic-embed-text, 768 dimensions) is computed once and consumed by four parallel heads:

Head	Classes	Purpose
intent-head	3	actionable / conversational / ambiguous
domain-head	17	weather / calendar / email / ... (domain routing)
task-type-head	8 (multi-label)	small_talk, skill_action, knowledge_personal, knowledge_general, safety_critical, meta_question, follow_up, multi_action
segmentation-head	Token-level BIO	Splits multi-action queries into atomic tasks

Downstream: Anaphora Resolver (layers A–D local, layer E optional cloud) resolves "it", "there", "that" against the dialog state. Dependency Detector (deixis + lastResult slot) detects references to previous answers. Output: a query plan with a list of classified segments.

Why multi-head instead of separate classifiers per dimension? One embedding, four decisions, one forward pass. Saves time and memory — critical with local models.

🗂️ IR Orchestrator — Reciprocal Rank Fusion over 4 stores

Information retrieval is not "query one vector DB". Clawminator orchestrates four parallel sources policy-driven — how strongly each source is weighted depends on the task type.

Store	Content	Retrieval
Knowledge Graph	Entity gazetteer + facts	Lookup + FTS fallback
Memory Chunks (Session)	Dialog history	BM25 + sqlite-vec cosine
Memory Chunks (Workspace)	SOUL / IDENTITY / USER.md	Persistent profile data
Document Store	User uploads: PDF/TXT/MD/DOCX	Chunk-based indexing

RRF Fusion (Reciprocal Rank Fusion): Results from all sources are fused via a policy matrix. Example: for knowledge_personal, the Knowledge Graph weighs more; for knowledge_general, the Document Store does. No static merging — the weights come from the query classifier output.

The result flows into the Layer 2 Router: a ranked context ready for skill matching or LLM injection.

🎚️ 7-Layer Router — ADR-021, policy-driven skill matching

The router runs through seven layers in fixed order. Each layer can match and answer directly, match and forward, or not match at all.

Layer	Function
2.0 Slash/Regex Support	Explicit commands and regex skills
2.1 Entity Gazetteer	Known entities from the KG
2.2 KG-First (fact queries)	For `knowledge_personal` — KG lookup first
2.3 Dialog-State-Prior	Context from the previous answer
2.4 Hybrid BM25 + Dense + RRF	CORE — fusion of text and vector search
2.5 MLP Domain Gate	Boost for domain-specific skills (no block)
2.6 Cross-Encoder Re-rank	Optional — expensive, but precise

After the router: Confidence Gate (ADR-025). HIGH (gap > 0.02) → execute skill. MEDIUM (0.005–0.02) → Intent-LLM (qwen3.5:9b local, 1-token answer, 200–500ms) clarifies. LOW (< 0.005) → forward to Layer 3 LLM.

This is not an if-else cascade — every layer is configurable, weights come from the active hardware profile, and the policy matrix controls which layers are active per task type.

🧩 Layer 3 LLM Pipeline — Bridge-Tool, Capability Injection, Context Guard

When a request really does need the LLM, it doesn't just go in as a prompt. Four stations before it keep the context small and relevant:

Station	Function
Bridge-Tool `execute_action`	~650 token budget. Unified entry point for LLM-driven skill calls. Replaces uploading all 26 tool definitions.
Capability Injection	Only the relevant capabilities for the current request are injected. Instead of 8,000 tokens for all tools: 400–800 for the right ones.
Context Engine	Two-slot system: `systemMessages[0]` static (KV-cache friendly), `systemMessages[1]` dynamic with IR chunks from the orchestrator.
Context Guard	Rule-based pruning at 70% context fill. Cleans up before the model can hallucinate because the context got too full.

The goal: even on a 16K model, there's enough budget left for a sensible answer. No "dump everything into context and hope".

🛡️ Quality Gate & Mistral Improve — Output check, async fallback

After the LLM response, a rule-based Quality Gate checks the output locally — no additional LLM call. It looks for typical failure patterns:

too_short — response truncated or empty
repetition — model repeats itself (infinite loop)
placeholder — "[insert answer here]" or similar
refusal — unwanted "I can't do that"
lang_mix — wrong language or mixed

Mistral Improve (async Fire-and-Forget): When cloud=on and the Quality Gate flags a FAIL, the response is sent to Mistral Small in the background for an improved version. The first (local) answer goes out immediately — no extra waiting for the user. The improvement arrives as a follow-up marked with a ☁️ icon.

Best of both worlds: speed of the local model, cloud quality when needed. And the user honestly sees where the answer comes from.

⚡ Complexity Router — one model, four tiers, optional cloud

Not every request needs a full generation budget. The router analyzes the complexity of each message heuristically (no LLM call, <5ms) and automatically selects token budget and execution target.

Local model: qwen3.5:9b runs on Mac Mini M4 with 16 GB RAM. The trick: it's not the model that has to be small, it's the context. Through classifier + retrieval only the truly relevant context enters — 16K tokens are enough.

Four complexity tiers:

Tier	Max Tokens	Target	Example
SUPER_LIGHT	256	Local	"Hello", "Thanks"
LIGHT	512	Local	Simple questions
MEDIUM	2,048	Local or cloud (if cloud=on)	Explanations, summaries
HEAVY	3,072	Local or cloud (if cloud=on)	Code generation, analysis

Cloud option: Currently integrated is Mistral Small with 128K context. Chosen for EU servers and GDPR compliance. More providers are planned — routing stays identical regardless of which model ends up receiving the prompt.

Routing is heuristic via a ComplexityAnalyzer — no LLM call, no extra tokens, under 5ms.

🖥️ Hardware profile system — current reference + planned profiles

Hardware profiles as JSON files with schema validation. Each profile defines model selection, context budget, memory parameters, timeouts and Ollama tuning.

Currently tested:

Profile	RAM	Model	Context	Status
reference	16 GB	qwen3.5:9b	16K	Mac Mini M4 — operational
tiny / medium / large	4–64+ GB	—	—	planned

What each profile will control: model configuration, context budget with ratios (system prompt 8–20%, memory 10–40%, history 37–45%, tools 10%), memory parameters (maxResults, minScore, fusion weights), stability timeouts and Ollama-specific tuning (keepAlive, flashAttention, kvCacheType).

🧠 Memory & Knowledge Graph — Stanford Generative Agents scoring

OpenClaw provides the base: SQLite with FTS5 full-text search and vector embeddings. That works.

Clawminator adds: A Knowledge Graph as SQLite triple store. Two tables (entities, relations) in memory.db. Traversal via recursive CTEs — up to 3 hops. Temporal weighting and confidence-based extraction via regex + LLM.

Retrieval scoring (based on Park et al., 2023):

score = 0.35 × weight + 0.35 × confidence + 0.30 × recency

Factor	Mechanism
Weight	+0.15 per repetition (Ebbinghaus-inspired)
Confidence	Per source: user self-report 0.95, LLM-extracted 0.50, system seed 0.40
Recency	7-day exponential decay

Temporal Contradiction Detection: Single-valued predicates automatically invalidate outdated facts ("lives in Vienna" replaces "lives in Berlin").

Cross-system index: KG facts are synchronized into full-text search.

Unlike the Stanford paper: heuristic importance scoring instead of LLM-based — optimized for local models with limited context. LLM-based scoring (1–10 scale) would be too expensive and unreliable in this setting.

Hybrid retrieval (OpenClaw): BM25 + vector cosine fusion, configurable per profile.

Memory lifecycle: Age-based expiry (default 90 days) + count-based trim. WAL mode prevents locking issues. Seed protection: seedFromWorkspace() doesn't overwrite user-set values on restart.

Anti-hallucination: Extracted entities must appear verbatim in the user message. No phantom connections.

🔀 Fork strategy — plugin-first, minimal core patches

Clawminator uses a hybrid approach: ≥90% plugin code, ≤10% core patches. OpenClaw is included as a git subtree with automated weekly sync via GitHub Actions.

This means: upstream updates from OpenClaw flow in regularly without breaking Clawminator code. The few core patches are clearly documented and isolated.

Why plugin instead of hard fork? A hard fork would be easier short-term but a maintenance nightmare long-term. The plugin system keeps upgrade paths open.

🧪 Testing — 655 automated + 170 manual tests

Clawminator follows an ASPICE-adapted requirements process with formal IDs, MoSCoW priorities and verification methods.

Category	Count	Coverage
Automated	655	Skills, router, memory, profiles, gateway
Manual	170	Telegram integration, macOS features, E2E
Total	825	All layers + integrations

The manual test suite covers everything automation can't: Telegram messages, Spotify control, iMessage, camera triggers, Apple Notes — real macOS interactions that need a running desktop.

💓 Health Monitor — real-time dashboard in browser

Clawminator provides a built-in health endpoint at /clawminator/health that shows the overall system state at a glance — directly in the browser, no extra tools needed.

What the health monitor checks:

Check	What's verified
Ollama	Reachability, loaded models, VRAM usage
Gateway	Process status, uptime, active sessions
Memory	SQLite state, Knowledge Graph size, FTS5 index
Disk	Free space, model directory
RAM	System memory, swap usage
Telegram	Bot connection, last message

The endpoint runs locally on port 18789 — only accessible on localhost, no external access. No authentication needed because nothing leaves the device.

Why local instead of cloud?

Five reasons. No marketing. No problemo.

🔒 Privacy

Everything stays on your device. No telemetry, no tracking, no data shared with third parties.

💰 Local-first, cloud-optional

The agent runs locally at zero cost. 90% of your requests are handled on-device — offline, in milliseconds, free.

📡 Always available

Works without internet (except weather + web search). Your assistant is never offline.

⚡ Fast

Skills respond in 50ms, LLM in 2–15 seconds. No waiting for cloud latency.

🧠 Orchestrator mode

Too complex for the local model? Clawminator can delegate tasks to Claude Code, Codex, Gemini or other CLI agents — directly from the server, no API keys to configure in Clawminator. You decide when and whether external help is involved.

Built on OpenClaw

OpenClaw is the framework. Clawminator is a specialization for local hardware.

OpenClaw is a generic open-source AI gateway — it supports cloud LLMs and local models (via Ollama) equally. 26 native tools, 53 bundled skills, 13,700+ community skills on ClawHub. Clawminator is a specialized configuration for a specific use case: local models with 16K context on consumer hardware, where every token counts.

	OpenClaw	Clawminator
Approach	Generic framework for all model sizes	Specialized for local models, consumer HW
Context Window	Typically 128K+ tokens	16K tokens (every token budgeted)
LLM Tools	26 native (Clawminator uses these in layer 3)	+ 13 own slash commands (0 tokens)
Deterministic Skills	—	75 skills via classifier+router (0 tokens, ~50ms)
Model Routing	1 model configurable	Local + cloud option, complexity analyzer (<5ms)
Knowledge Graph	—	SQLite triple store with Stanford scoring
Languages	English	German + English
Target audience	Developers, power users, all model sizes	Token-efficient setups, 16K context range

OpenClaw is a generic framework that works with all model sizes. Clawminator specializes in the 16K context range and adds its own layers: a multi-head Query Classifier, 75 deterministic skills, a 7-Layer Router, Quality Gate with async improve loop, and a Knowledge Graph with Stanford scoring.

Mission Timeline

What exists — and what's coming.

✅

Today — available

75 Skills, 13 Commands, 52 SKILL.md templates, Knowledge Graph, 2 LLMs, Telegram Bot, German + English, three-layer architecture

🔄

In progress

One-click installer with automatic system language detection

📋

Planned

Additional hardware profiles (tiny/medium/large) for different RAM configurations, Windows & Linux support, more interfaces (Web, API, MS Teams), integration adapters for enterprise channels

⚠️ Planned features do not exist yet — they are intended for future versions.

License & Collaboration

Open source core. Enterprise integration on request.

📖 MIT License — Core

The full architecture will be publicly available under MIT license once the test suite is complete: query classifier, IR orchestrator, 7-layer router, memory system with Stanford scoring, CLI tooling, all ADRs. Use it, fork it, build on it.

🔧 Available separately

Trained classifier weights, enterprise connectors (MS Teams, SharePoint, M365, SAP), deployment automation and domain-specific fine-tunings are not released as open source. These parts emerge from projects with companies that need them.

💬 Open for conversations

I'm a requirements engineer and software system designer with an automotive background (ASPICE, Bosch since 2017). Clawminator is the combination of both: RE discipline applied to AI agents. Open to consulting, integration projects, and the right full-time role.

Clawminator is in development. Stay tuned. 🦞

The project is not yet publicly available. Sign up and we'll notify you about progress, beta access and release. No spam — only real updates.

I agree that my data will be stored for sending product updates. No spam, unsubscribe anytime. Privacy policy

✓ Almost there! We've sent you a confirmation email. Please click the link inside. Check your spam folder too — new domains sometimes land there. 🦞