Maximus · Memory System

Knowledge Graph
& Agent Memory

How Maximus agents accumulate structured knowledge across sessions, share it with teammates, and recall it automatically before the next task.

Every Maximus agent starts each session with a blank context window. Without memory, an agent that spent two sessions learning how to reliably paginate a particular API will rediscover that same lesson in session three — and four, and five. The knowledge graph exists to solve this problem at the swarm level, not just for individual agents.

The system has three jobs: record what agents experience during sessions, consolidate those experiences into a queryable knowledge graph overnight, and inject the relevant slice of that graph back into each agent's system prompt before the next session. The agent doesn't need to know any of this is happening — it simply finds that it already knows things it hasn't been explicitly told.

Two storage backends work together. Kuzu (an embedded graph database) holds entities and relationships — queryable via Cypher with scope-aware traversals. SQLite holds episodes, cached briefings, and swarm metrics. Neither requires a server; both are file-local and open when the first memory-enabled agent starts.

Data Pipeline Overview

The pipeline splits cleanly into two async phases. During a session, the runtime writes a JSONL trace file — every tool call, result, agent message, and the final outcome — but nothing is processed yet. Processing happens during deep sleep consolidation (by default, 3 AM daily), so the overhead never touches live sessions.

The only synchronous step is pre-session injection: the runtime reads the pre-cached briefing from SQLite (a single fast read) and prepends it to the agent's system prompt. If no cache exists yet, this is a no-op.

Architecture Components

Six components do the work, each with a single responsibility. Three run during consolidation, two serve live sessions, and one tracks swarm-level signals.

EpisodeDistiller

Reads raw JSONL trace files and extracts three things from each session: a concise summary of what happened, the outcome (success / failure / partial from the session:end event), and a list of lessons. Lessons come from scanning error patterns, retry sequences, and terminal states — a 429 followed by a successful backoff becomes "rate limit encountered; exponential backoff resolved it." No LLM is involved; this step is pure parsing and SQL writes.

EntityExtractor

Batches new episodes and sends them to Claude Haiku in parallel requests. Haiku returns a structured JSON list of entities (name, type, attributes) and triples (source → predicate → target, confidence). Haiku is used deliberately over a larger model: entity extraction is a well-structured, schema-constrained task, the volume can be high, and costs scale with session activity. Direct assertions score 0.9–1.0; inferences score lower. Entities are deduplicated by name; attributes are merged if the entity already exists.

KnowledgeStore

The typed read/write interface to the Kuzu graph. Translates scope queries into Cypher WHERE clauses and filters active-only facts via validTo IS NULL. You can also pass raw Cypher strings for ad-hoc queries. All writes go through KnowledgeStore to ensure the temporal supersession protocol is applied consistently.

BriefingGenerator

Queries Kuzu and SQLite, assembles a prioritized markdown briefing within the briefingTokenBudget character limit, and writes the result to the SQLite cache. Priority order: failure lessons first (most actionable), then high-confidence graph facts, then successful strategies. When content exceeds the budget, strategies are trimmed before knowledge, and knowledge is trimmed before lessons — so the most critical information always fits. The cache is invalidated whenever a new episode or triple arrives for that agent.

PromptInjector

Runs synchronously at session start. Reads the cached briefing from SQLite and prepends it to the agent's system prompt with explicit delimiters and a header instructing the model not to treat the briefing as task instructions. Without the delimiter, the agent might interpret a past lesson like "always use Jina Reader" as a hard override rather than prior experience to draw on. If no briefing is cached, this component is a no-op and the session starts normally.

SwarmMetrics

Tracks two cross-agent signals: knowledge utilization (how often entities and triples from the graph appear in briefings that the agent acted on) and delegation success rate (from delegation:result trace events, per delegator–delegatee pair). Utilization counts drive automatic scope promotion. Delegation rates surface in the dashboard as a signal for structural swarm problems — an orchestrator that repeatedly delegates to a failing agent will appear there.

Component	Role	Store
`EpisodeDistiller`	Converts raw session JSONL traces → structured episodes (what happened, outcome, lessons)	SQLite
`EntityExtractor`	Runs Claude Haiku over episodes to extract typed entities and relationship triples	Kuzu
`KnowledgeStore`	CRUD on the Kuzu graph — entities, triples, Cypher queries, scope filtering	Kuzu
`BriefingGenerator`	Assembles a markdown briefing from recent episodes + relevant graph facts, cached in SQLite	SQLite
`PromptInjector`	Prepends the cached briefing to the agent's system prompt before each session	—
`SwarmMetrics`	Tracks knowledge utilization and delegation success rates across all agents	SQLite

Knowledge Graph Entities & Triples

The graph is made up of two primitives: entities (named nodes with a type and attributes) and triples (directed edges: source → predicate → target). Every fact in the system is ultimately represented as a triple — the graph is not a document store or a key-value cache, it is a structured web of relationships that can be traversed with graph queries.

The diagram below shows a real scenario: a web-researcher agent discovered that the Instantly API has a 10 req/s rate limit, that a pricing-scraper agent depends on the Jina Reader API to bypass bot protection, and that direct fetches return 403 errors. Each of these is a separate triple in the graph — extractable, queryable, and scope-promotable independently.

Entity types follow a small fixed taxonomy, but the EntityExtractor can use any custom string when none of the standard types fit. Standard types get consistent visual treatment in the dashboard and consistent query handling in the scope system.

Each entity has an attributes map stored as JSON. Attributes from multiple extractions are merged — if two different episodes both discover details about the Instantly API, their attribute sets are unioned. This lets the graph accumulate a richer picture of an entity over time without creating duplicate nodes.

Entity Type	Examples	Extracted when…
api	Instantly API, GitHub API	Agent makes API calls; rate limits or auth patterns observed
tool	web-search, send-email, Jina Reader API	Agent uses a tool repeatedly or discovers a trick about it
agent	deploy-agent, research-manager, pricing-scraper	Agent delegates to another agent; outcome recorded
error	429 Too Many Requests, 403 Forbidden	Error required a specific resolution strategy
concept	rate limiting, pagination cursors, bot protection	Abstract pattern appears across multiple tools or sessions
project / client	Project Apollo, Acme Corp — or any custom string	Work context spans multiple sessions or agents

Scope Hierarchy Visibility

Not every fact should be visible to every agent. A discovery that's specific to one agent's workflow would be noise in another agent's briefing. The scope system solves this with three nested visibility levels that determine which agents can read a given triple.

Team membership is derived automatically from the reportsTo field in agent YAML frontmatter — no manual team declaration needed. All agents with the same reportsTo value are on the same team. An agent without a reportsTo field is on no team and sees only its own private scope plus global.

The briefing query unions all three levels in the agent's chain: its own agent-scoped facts, any team-scoped facts from its team, and all global-scoped facts. Results are sorted by confidence descending and trimmed to the briefing budget.

Agent scope

Private. Only the creating agent can read it.

The default scope for all new triples. Facts live here until they prove useful to teammates or reach the global threshold.

Team scope

Shared with all agents that share the same reportsTo.

When an agent-scoped fact is retrieved by multiple teammates, it gets promoted here so they all benefit from it automatically.

Global scope

Visible to every agent in the swarm.

Reserved for cross-cutting facts — API rate limits, auth patterns, known error types — that are universally relevant regardless of team.

Scope Promotion

Facts automatically bubble up the hierarchy during deep sleep consolidation based on utilization counts tracked by SwarmMetrics. An agent-scoped fact used by multiple teammates gets promoted to team scope; a team-scoped fact used across multiple teams gets promoted to global.

The learningRate config field controls the utilization threshold: conservative requires more evidence before promoting, aggressive promotes quickly. This is a per-agent setting — a high-traffic orchestrator agent might use aggressive while a specialized worker uses conservative to keep its knowledge private until well-validated.

# Manually promote a triple without waiting for the utilization threshold
maximus memory promote <sourceId> <predicate> <targetId>

Additional Scopes via knowledgeScopes

An agent can reach into other teams' knowledge beyond its natural hierarchy using the knowledgeScopes field. This is useful for cross-functional agents that coordinate between teams — for example, an integration agent that needs to know both what the infra team and the data team have learned.

memory:
  knowledgeScopes:
    - infra-team   # query facts from agents reporting to infra-lead
    - data-team    # query facts from data pipeline agents

Knowledge Triples Temporal Facts

Relationships between entities are stored as source → predicate → target triples. Each is timestamped, scoped, and confidence-weighted. The predicate vocabulary is open — the extractor invents new predicates when none of the common ones fit (uses, depends_on, failed_with, learned_about, works_with, bypasses, rate_limit).

Confidence values reflect how clearly the fact was established in the episode. A direct assertion from the API response — "X-Rate-Limit: 10" — scores 0.9–1.0. An inferred relationship — "agent A seems to call tool B regularly" — might score 0.6–0.7. The briefing generator surfaces high-confidence facts first and can omit low-confidence ones when the budget is tight.

SOURCE

Instantly API

rate_limit = 10 req/s

TARGET

sending ops

Field	Description	Example
`predicate`	Short verb phrase	`uses`, `depends_on`, `failed_with`
`scope`	Visibility level	agent team global
`confidence`	Float 0–1	`0.95`
`evidence`	Source episode ID	`ep_abc123`
`validFrom / validTo`	`validTo=null` → currently active	`1742000000 / null`
`createdBy`	Extracting agent	`researcher`

Temporal Supersession

When a new fact contradicts an existing one (same source, predicate, and target), the old triple's validTo is stamped with the current timestamp and the new triple is inserted with validFrom = now. The graph is an append-only fact log — history is never deleted.

This means you can never silently overwrite a fact. Every belief change is recorded. If an agent suddenly behaves differently after a consolidation run, you can inspect the triple history to see exactly what changed in its knowledge graph and which episode caused the change.

Agent Briefings Pre-session Injection

A briefing is a compact markdown document injected at the top of an agent's system prompt before every session. It surfaces the specific, hard-won facts about the agent's environment that wouldn't otherwise be available. The briefing is intentionally compact — its purpose is not to replace training but to bridge the gap between what the model generally knows and what this specific agent has learned in this specific environment.

The PromptInjector wraps the briefing in explicit delimiters with a header instructing the model not to treat it as task instructions. Without this, the agent might interpret a past lesson like "always use Jina Reader for bot-protected sites" as a hard rule that overrides the user's actual task rather than as prior experience to draw on when relevant.

Briefings are pre-generated during deep sleep and cached in SQLite. The cache is invalidated when any new episode or triple arrives for the agent, ensuring the briefing is never more than one consolidation cycle (by default, 24 hours) behind real-time.

Priority Order

Recent Lessons

failures & what went wrong

Key Knowledge

entities & relationships from graph

Active Strategies

successful patterns from episodes

Example Briefing

## Session Briefing for researcher

### Recent Lessons
- [failure] scrape pricing page: 403 on
  direct fetch (1 day ago)
- [success] Jina Reader bypassed bot
  protection (1 day ago)

### Key Knowledge
- Jina Reader API: bypasses_bot = true
  (confidence: 0.95)
- pricing-scraper depends_on
  Jina Reader API (conf: 0.9)

### Active Strategies
- Try Jina Reader before direct fetch
- Check for pagination cursors

Config Field	Type	Default	Description
`episodic`	boolean	`true`	Capture episodes from sessions
`maxEpisodes`	number	`50`	Max episodes retained per agent
`briefingEnabled`	boolean	`true`	Inject briefing before sessions
`briefingTokenBudget`	number	`2000`	Max characters for briefing
`learningRate`	string	`moderate`	`conservative` / `moderate` / `aggressive`
`knowledgeScopes`	string[]	`[]`	Additional team scopes to query

Deep Sleep Consolidation 3 AM Daily

Consolidation is the background job that turns raw session traces into structured knowledge. It runs on a cron schedule (default: 3 AM daily) and processes all accumulated trace files since the last run. The six-step pipeline is idempotent — if it crashes halfway through, the next run picks up where it left off using a watermark in SQLite.

Steps ① and ② are fast (pure parsing and SQL writes, no LLM). Step ③ is the most expensive because it involves Claude Haiku API calls — but it only runs on new episodes, so costs scale with session activity, not swarm size. For high-volume swarms, you can run consolidation more frequently to keep briefings fresher.

① Trace Analysis

Scans the trace directory for JSONL files not yet processed. Uses a watermark in SQLite to avoid reprocessing.

② Episode Distillation

Extracts task, outcome, and lessons from each trace. Pure parsing — no LLM calls. Results written to SQLite.

③ Entity Extraction

Batches new episodes to Claude Haiku in parallel. Returns typed entities and triples with confidence scores. Written to Kuzu.

④ Briefing Generation

Assembles prioritized markdown briefings for agents with new activity. Writes to the SQLite cache. Agents with no changes are skipped.

⑤ Stale Knowledge Pruning

Removes triples with validTo older than the retention window (default: 90 days). Prunes oldest episodes past maxEpisodes.

⑥ Scope Promotion

Reads utilization counts from SwarmMetrics. Promotes qualifying triples to wider scopes. Resets counters.

# Configure via env var (cron syntax). Default: 3 AM daily.
export MAXIMUS_DEEP_SLEEP_SCHEDULE="0 3 * * *"

# For high-volume swarms — run every 6 hours for fresher briefings
export MAXIMUS_DEEP_SLEEP_SCHEDULE="0 */6 * * *"

CLI Reference

Three commands give you visibility into and control over the memory system. status is the starting point — it shows the overall health of the system, when consolidation last ran, and which agents are accumulating the most knowledge. inspect drills into a single agent. promote lets you push a fact to a wider scope without waiting for the automatic utilization threshold to trigger.

# Overall memory status — episode counts, graph size, briefing cache state,
# last consolidation timestamp, and top agents by knowledge utilization
maximus memory status

# Inspect a specific agent — recent episodes with outcomes and lessons,
# known entities and triples in its scope chain, and the current cached briefing
maximus memory inspect <agent-name>

# Manually promote a triple to a higher scope, bypassing the utilization threshold
maximus memory promote <sourceId> <predicate> <targetId>

Swarm Metric: Knowledge Utilization

How often entities and triples appear in briefings the agent subsequently acted on. Drives scope promotion.

Facts that are extracted but never retrieved will eventually be pruned. High utilization is the signal that a fact is worth sharing more widely.

Swarm Metric: Delegation Success Rate

Success/failure rate per delegator–delegatee pair, from delegation:result trace events.

Surfaces structural problems — an orchestrator repeatedly delegating to a failing agent will appear in memory status as a pattern worth investigating.

Knowledge Graph& Agent Memory