Knowledge Graph
& Agent Memory
How Maximus agents accumulate structured knowledge across sessions, share it with teammates, and recall it automatically before the next task.
Every Maximus agent starts each session with a blank context window. Without memory, an agent that spent two sessions learning how to reliably paginate a particular API will rediscover that same lesson in session three — and four, and five. The knowledge graph exists to solve this problem at the swarm level, not just for individual agents.
The system has three jobs: record what agents experience during sessions, consolidate those experiences into a queryable knowledge graph overnight, and inject the relevant slice of that graph back into each agent's system prompt before the next session. The agent doesn't need to know any of this is happening — it simply finds that it already knows things it hasn't been explicitly told.
Two storage backends work together. Kuzu (an embedded graph database) holds entities and relationships — queryable via Cypher with scope-aware traversals. SQLite holds episodes, cached briefings, and swarm metrics. Neither requires a server; both are file-local and open when the first memory-enabled agent starts.
Data Pipeline Overview
The pipeline splits cleanly into two async phases. During a session, the runtime writes a JSONL trace file — every tool call, result, agent message, and the final outcome — but nothing is processed yet. Processing happens during deep sleep consolidation (by default, 3 AM daily), so the overhead never touches live sessions.
The only synchronous step is pre-session injection: the runtime reads the pre-cached briefing from SQLite (a single fast read) and prepends it to the agent's system prompt. If no cache exists yet, this is a no-op.
Architecture Components
Six components do the work, each with a single responsibility. Three run during consolidation, two serve live sessions, and one tracks swarm-level signals.
Reads raw JSONL trace files and extracts three things from each session: a concise summary of what happened, the outcome (success / failure / partial from the session:end event), and a list of lessons. Lessons come from scanning error patterns, retry sequences, and terminal states — a 429 followed by a successful backoff becomes "rate limit encountered; exponential backoff resolved it." No LLM is involved; this step is pure parsing and SQL writes.
Batches new episodes and sends them to Claude Haiku in parallel requests. Haiku returns a structured JSON list of entities (name, type, attributes) and triples (source → predicate → target, confidence). Haiku is used deliberately over a larger model: entity extraction is a well-structured, schema-constrained task, the volume can be high, and costs scale with session activity. Direct assertions score 0.9–1.0; inferences score lower. Entities are deduplicated by name; attributes are merged if the entity already exists.
The typed read/write interface to the Kuzu graph. Translates scope queries into Cypher WHERE clauses and filters active-only facts via validTo IS NULL. You can also pass raw Cypher strings for ad-hoc queries. All writes go through KnowledgeStore to ensure the temporal supersession protocol is applied consistently.
Queries Kuzu and SQLite, assembles a prioritized markdown briefing within the briefingTokenBudget character limit, and writes the result to the SQLite cache. Priority order: failure lessons first (most actionable), then high-confidence graph facts, then successful strategies. When content exceeds the budget, strategies are trimmed before knowledge, and knowledge is trimmed before lessons — so the most critical information always fits. The cache is invalidated whenever a new episode or triple arrives for that agent.
Runs synchronously at session start. Reads the cached briefing from SQLite and prepends it to the agent's system prompt with explicit delimiters and a header instructing the model not to treat the briefing as task instructions. Without the delimiter, the agent might interpret a past lesson like "always use Jina Reader" as a hard override rather than prior experience to draw on. If no briefing is cached, this component is a no-op and the session starts normally.
Tracks two cross-agent signals: knowledge utilization (how often entities and triples from the graph appear in briefings that the agent acted on) and delegation success rate (from delegation:result trace events, per delegator–delegatee pair). Utilization counts drive automatic scope promotion. Delegation rates surface in the dashboard as a signal for structural swarm problems — an orchestrator that repeatedly delegates to a failing agent will appear there.
Knowledge Graph Entities & Triples
The graph is made up of two primitives: entities (named nodes with a type and attributes) and triples (directed edges: source → predicate → target). Every fact in the system is ultimately represented as a triple — the graph is not a document store or a key-value cache, it is a structured web of relationships that can be traversed with graph queries.
The diagram below shows a real scenario: a web-researcher agent discovered that the Instantly API has a 10 req/s rate limit, that a pricing-scraper agent depends on the Jina Reader API to bypass bot protection, and that direct fetches return 403 errors. Each of these is a separate triple in the graph — extractable, queryable, and scope-promotable independently.
Entity types follow a small fixed taxonomy, but the EntityExtractor can use any custom string when none of the standard types fit. Standard types get consistent visual treatment in the dashboard and consistent query handling in the scope system.
Each entity has an attributes map stored as JSON. Attributes from multiple extractions are merged — if two different episodes both discover details about the Instantly API, their attribute sets are unioned. This lets the graph accumulate a richer picture of an entity over time without creating duplicate nodes.
Scope Hierarchy Visibility
Not every fact should be visible to every agent. A discovery that's specific to one agent's workflow would be noise in another agent's briefing. The scope system solves this with three nested visibility levels that determine which agents can read a given triple.
Team membership is derived automatically from the reportsTo field in agent YAML frontmatter — no manual team declaration needed. All agents with the same reportsTo value are on the same team. An agent without a reportsTo field is on no team and sees only its own private scope plus global.
The briefing query unions all three levels in the agent's chain: its own agent-scoped facts, any team-scoped facts from its team, and all global-scoped facts. Results are sorted by confidence descending and trimmed to the briefing budget.
reportsTo.Facts automatically bubble up the hierarchy during deep sleep consolidation based on utilization counts tracked by SwarmMetrics. An agent-scoped fact used by multiple teammates gets promoted to team scope; a team-scoped fact used across multiple teams gets promoted to global.
The learningRate config field controls the utilization threshold: conservative requires more evidence before promoting, aggressive promotes quickly. This is a per-agent setting — a high-traffic orchestrator agent might use aggressive while a specialized worker uses conservative to keep its knowledge private until well-validated.
# Manually promote a triple without waiting for the utilization threshold
maximus memory promote <sourceId> <predicate> <targetId>
An agent can reach into other teams' knowledge beyond its natural hierarchy using the knowledgeScopes field. This is useful for cross-functional agents that coordinate between teams — for example, an integration agent that needs to know both what the infra team and the data team have learned.
memory: knowledgeScopes: - infra-team # query facts from agents reporting to infra-lead - data-team # query facts from data pipeline agents
Knowledge Triples Temporal Facts
Relationships between entities are stored as source → predicate → target triples. Each is timestamped, scoped, and confidence-weighted. The predicate vocabulary is open — the extractor invents new predicates when none of the common ones fit (uses, depends_on, failed_with, learned_about, works_with, bypasses, rate_limit).
Confidence values reflect how clearly the fact was established in the episode. A direct assertion from the API response — "X-Rate-Limit: 10" — scores 0.9–1.0. An inferred relationship — "agent A seems to call tool B regularly" — might score 0.6–0.7. The briefing generator surfaces high-confidence facts first and can omit low-confidence ones when the budget is tight.
When a new fact contradicts an existing one (same source, predicate, and target), the old triple's validTo is stamped with the current timestamp and the new triple is inserted with validFrom = now. The graph is an append-only fact log — history is never deleted.
This means you can never silently overwrite a fact. Every belief change is recorded. If an agent suddenly behaves differently after a consolidation run, you can inspect the triple history to see exactly what changed in its knowledge graph and which episode caused the change.
Agent Briefings Pre-session Injection
A briefing is a compact markdown document injected at the top of an agent's system prompt before every session. It surfaces the specific, hard-won facts about the agent's environment that wouldn't otherwise be available. The briefing is intentionally compact — its purpose is not to replace training but to bridge the gap between what the model generally knows and what this specific agent has learned in this specific environment.
The PromptInjector wraps the briefing in explicit delimiters with a header instructing the model not to treat it as task instructions. Without this, the agent might interpret a past lesson like "always use Jina Reader for bot-protected sites" as a hard rule that overrides the user's actual task rather than as prior experience to draw on when relevant.
Briefings are pre-generated during deep sleep and cached in SQLite. The cache is invalidated when any new episode or triple arrives for the agent, ensuring the briefing is never more than one consolidation cycle (by default, 24 hours) behind real-time.
Deep Sleep Consolidation 3 AM Daily
Consolidation is the background job that turns raw session traces into structured knowledge. It runs on a cron schedule (default: 3 AM daily) and processes all accumulated trace files since the last run. The six-step pipeline is idempotent — if it crashes halfway through, the next run picks up where it left off using a watermark in SQLite.
Steps ① and ② are fast (pure parsing and SQL writes, no LLM). Step ③ is the most expensive because it involves Claude Haiku API calls — but it only runs on new episodes, so costs scale with session activity, not swarm size. For high-volume swarms, you can run consolidation more frequently to keep briefings fresher.
validTo older than the retention window (default: 90 days). Prunes oldest episodes past maxEpisodes.# Configure via env var (cron syntax). Default: 3 AM daily. export MAXIMUS_DEEP_SLEEP_SCHEDULE="0 3 * * *" # For high-volume swarms — run every 6 hours for fresher briefings export MAXIMUS_DEEP_SLEEP_SCHEDULE="0 */6 * * *"
CLI Reference
Three commands give you visibility into and control over the memory system. status is the starting point — it shows the overall health of the system, when consolidation last ran, and which agents are accumulating the most knowledge. inspect drills into a single agent. promote lets you push a fact to a wider scope without waiting for the automatic utilization threshold to trigger.
# Overall memory status — episode counts, graph size, briefing cache state, # last consolidation timestamp, and top agents by knowledge utilization maximus memory status # Inspect a specific agent — recent episodes with outcomes and lessons, # known entities and triples in its scope chain, and the current cached briefing maximus memory inspect <agent-name> # Manually promote a triple to a higher scope, bypassing the utilization threshold maximus memory promote <sourceId> <predicate> <targetId>
delegation:result trace events.memory status as a pattern worth investigating.