This is the companion to the self-hosted LLM post. That one covered the memory layer in a paragraph. This is the part of it that earns its own post.

Most "give your AI a memory" setups are RAG over your notes: embed the documents, retrieve the nearest chunks at query time, paste them into the prompt. That hands the model a pile of relevant text. It does not hand it structure. It cannot answer "what runs on that host" or "what depends on this service" because those relationships were never stored as relationships - they were dissolved into prose and have to be re-guessed every time.

A knowledge graph stores them as relationships. Entities and typed edges: (subject, predicate, object). (cliopi, runs, influxdb), (relay, hosts, dns). The model reads the edges directly instead of inferring them from retrieved paragraphs.

Short answer to what I built: a pipeline that turns my own writing - my notes, and the things I have actually typed into chat - into a typed-triple graph, deduped and normalised, that a local model reads from and writes to during a conversation.

The one rule: only my own words#

The graph is built ONLY from text I wrote. My notes, my side of chat logs. Never the assistant's replies, never tool output.

This rule is load-bearing. An assistant's output is fluent, plausible, and frequently slightly wrong. Feed it back into the memory and you get a graph that launders the model's guesses into "facts," and the next session reads them as ground truth. The error compounds with every round. So the collector drops every assistant turn and every tool result and keeps only the human input. The graph can only know what I actually said.

The pipeline#

my notes + my chat turns          (assistant + tool output dropped)
        |  collect
        v
   a strong model extracts (subject, predicate, object) triples
        |  load
        v
   aggregate + dedupe across everything extracted
        |  normalise
        v
   collapse name variants:  "the relay" = "relay01" = "the VPS"  -> one node
        |
        v
   graph of typed triples   <-- read AND written during conversation

Collect: gather only my own input into a cache. Everything else is filtered out here, at the source.

Extract: a capable model reads the text and pulls out triples. This is the step that needs a real model, not a 4B - turning loose prose into clean (subject, predicate, object) is the hard part, and a weak model invents edges that were never stated.

Load: aggregate and dedupe across everything extracted, so the same fact stated five times is one edge.

Normalise: the same thing shows up under five names - "the relay," "relay01," "the VPS." Normalisation collapses name variants to one entity, and synonymous predicates to one edge type, against a map I maintain. Skip this step and the graph becomes a fog of near-duplicate nodes that never connect to each other.

Reading and writing it live#

The graph is not a build-once artifact. During a conversation the model reads relationships out of it - one query returns the edges around an entity, in both directions - and writes new ones back as it learns them. I mention a new fact, it lands as an edge, and the next session already has it.

It sits behind a single memory lookup alongside the flat-file notes and a semantic search, so one call returns three things at once: curated facts, keyword hits, and graph relationships. And if the graph layer is not installed, the whole thing degrades to a substring scan over the plain note files. The files are the source of truth; the graph is the structure laid on top.

Seeing it#

A few thousand entities is too many to read as a list, so it renders: an interactive clustered view to explore it, a static export for a printable map, and a commit-history-style animation of the graph growing over time. Mostly the visualisation is a debugging tool. When normalisation misses and one entity splits into three nodes, you see it on the graph immediately, in a way you never would in a database.

Is it worth it over plain RAG?#

For "find me the note about X," no - semantic search wins and the graph is overhead. The graph earns its place on relational questions: what runs on this host, what depends on this service, what connects to what. Those answers are edges, and edges are exactly what RAG dissolves into prose and throws away.