Editor's Note
graphmemory
Build and query embedded GraphRAG knowledge graphs with DuckDB-backed vector, full-text, and hybrid search. Use when the user wants to store entities and relations, run RAG over a graph, extract knowledge graphs from text with DSPy, run graph algorithms (PageRank, centrality, components), merge/upsert nodes and edges, fuzzy-dedupe an existing graph, or visualize interactively in a browser. Trigger phrases include "knowledge graph", "GraphRAG", "graph database", "hybrid search", "extract entities and relations", "DuckDB graph", "embedded graph store", "dedupe graph nodes".
Install
npx skills add https://github.com/bradAGI/GraphMemory --skill graphmemoryGraphMemory
Embedded GraphRAG database built on DuckDB. Single Python package — no server, no external services. Ships vector (HNSW), full-text (BM25), hybrid search, fluent query builder, multi-hop traversal, fuzzy dedup, DSPy extraction, NetworkX algorithms, and a zero-dep D3.js visualizer.
When to reach for this
- Knowledge graph with semantic search (not just a vector DB, not just a graph DB).
- RAG where graph traversal is part of retrieval.
- Extract entities/relations from text and store them durably with dedup.
- Prototyping — file-backed or in-memory graph without Neo4j/Postgres/pgvector.
Do not use when: user already has Neo4j/Neptune/ArangoDB, or scale is hundreds of millions of nodes — GraphMemory is DuckDB-embedded, single-writer.
Install
pip install graphmemory
pip install graphmemory[extraction] # DSPy entity/relation extraction
pip install graphmemory[algorithms] # NetworkX algorithms
Decision table
| User intent | Method |
|---|---|
| Insert one node | graph.insert_node(node) |
| Bulk insert | graph.bulk_insert_nodes(nodes) |
| Insert-or-update by property | graph.merge_node(node, match_keys=["name"]) |
| Fuzzy insert-or-update | graph.merge_node(node, match_keys=["name"], similarity_threshold=0.9) |
Dedupe edges on (src, tgt, relation) | graph.merge_edge(edge) |
| Clean up existing duplicates | graph.resolve_duplicates(match_keys=["name"], similarity_threshold=0.9) |
| Pure vector kNN | graph.nearest_nodes(vector, limit) |
| Pure BM25 text | graph.search_nodes(query, limit) |
| Combined text + vector | graph.hybrid_search(query, query_vector, text_weight, vector_weight) |
| Lookup by property | graph.nodes_by_attribute("name", "Alice") |
| Direct neighbors | graph.connected_nodes(node_id) |
| Multi-hop traversal | graph.query().traverse(source_id=id, depth=2).execute() |
| Filtered query | graph.query().match(type="Person").where(role="eng").execute() |
| GraphRAG context assembly | graph.retrieve(query, query_vector, max_hops, max_tokens) |
| End-to-end Q&A | graph.ask(query, query_vector, llm_callable=fn) |
| Extract + store from text | extract_and_merge(graph, text, match_keys=["name"]) |
| Extract in parallel across chunks | extract_and_merge_parallel(graph, chunks, max_workers=8) |
| PageRank / centrality | pagerank(graph), betweenness_centrality(graph) |
| Atomic block | with graph.transaction(): ... |
| Browser visualization | graph.visualize() |
Canonical snippets
Init
from graphmemory import GraphMemory, Node, Edge, MergeStrategy
# database=None is in-memory; pass a path for persistence.
# vector_length and distance_metric are fixed at init time.
graph = GraphMemory(
database="graph.db",
vector_length=1536, # must match your embedding model
distance_metric="cosine", # "l2" | "cosine" | "inner_product"
hnsw_ef_construction=128,
hnsw_ef_search=64,
hnsw_m=16,
auto_index=True, # HNSW auto-built on init
max_retries=3, # transient IO error retry
)
Insert + merge
alice = Node(type="Person", properties={"name": "Alice"}, vector=embed("Alice"))
bob = Node(type="Person", properties={"name": "Bob"}, vector=embed("Bob"))
graph.insert_node(alice)
graph.insert_node(bob)
graph.insert_edge(Edge(source_id=alice.id, target_id=bob.id, relation="reports_to"))
# Idempotent re-ingest on a natural key
graph.merge_node(alice, match_keys=["name"])
# Fuzzy merge — tolerates "Alice Smith" vs "alice smith"
graph.merge_node(
alice,
match_keys=["name"],
similarity_threshold=0.9, # Jaro-Winkler threshold (1.0 = exact)
vector_threshold=0.2, # optional cosine distance cap
match_type=True, # also require same `type`
strategy=MergeStrategy.UPDATE, # UPDATE | REPLACE | KEEP
)
Hybrid search
results = graph.hybrid_search(
query_text="who leads ML?",
query_vector=embed("who leads ML?"),
text_weight=0.5,
vector_weight=0.5,
limit=10,
)
for r in results:
print(r.score, r.node.properties)
GraphRAG
# Context-only (own the prompt)
result = graph.retrieve(
query=q, query_vector=qv,
max_hops=2, max_tokens=4000, search_limit=10,
)
print(result.context_text, result.token_estimate, result.seed_node_count, result.total_node_count)
# End-to-end — llm_callable signature: (system_prompt, user_prompt) -> str
def my_llm(system, user):
return openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system}, {"role": "user", "content": user}],
).choices[0].message.content
answer = graph.ask(query=q, query_vector=qv, llm_callable=my_llm)
print(answer["answer"])
Pass llm_callable=None to get retrieval-only output — useful to inspect the context before wiring an LLM.
Query builder
# Filter by type + property
engineers = graph.query().match(type="Person").where(role="engineer").execute()
# Multi-hop traversal — returns TraversalResult with depth + path
two_hop = graph.query().traverse(source_id=alice.id, depth=2).execute()
# Paginate + order
page = graph.query().match(type="Person").order_by("name").limit(20).offset(40).execute()
# Return edges instead of nodes
edges = graph.query().match(type="Person").edges().execute()
DSPy extraction
import dspy
from graphmemory.extraction import extract_and_merge, extract_and_merge_parallel
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
# Single pass
node_results, edge_results = extract_and_merge(
graph, text, match_keys=["name"], similarity_threshold=0.88,
)
# Parallel across chunks — two phases: nodes first (all chunks), then edges
# with the full node context. Saturates your RPM.
node_results, edge_results = extract_and_merge_parallel(
graph,
chunks=paragraph_chunks,
match_keys=["name"],
similarity_threshold=0.88,
max_workers=8, # match your provider's RPM headroom
on_progress=lambda phase, done, total: print(f"{phase}: {done}/{total}"),
)
Transactions
with graph.transaction():
graph.insert_node(a)
graph.insert_node(b)
graph.insert_edge(Edge(source_id=a.id, target_id=b.id, relation="x"))
# Exception inside the block → ROLLBACK. Clean exit → COMMIT.
Advanced patterns
Two-pass dedup (idiomatic)
Extract with a loose threshold, then clean up with a tighter one. This is the pattern in examples/test_ingest.py.
# Pass 1 — during ingest, be permissive to avoid fragmenting entities
extract_and_merge_parallel(graph, chunks, similarity_threshold=0.88, max_workers=50)
# Pass 2 — after ingest, resolve residual duplicates more strictly
clusters = graph.resolve_duplicates(
match_keys=["name"],
match_type=True,
similarity_threshold=0.9,
vector_threshold=0.15,
)
for c in clusters:
print(f"Kept {c.survivor.properties['name']}, merged {len(c.merged)} dups")
resolve_duplicates picks the first-seen node as survivor, reassigns all incoming/outgoing edges to it, and deletes the rest. Self-loops from the reassignment are dropped.
Custom chunking + sequential linking
Pattern from examples/lexical_graph.py:
prev = None
for chunk in chunks:
node = Node(type="Chunk", properties={"text": chunk}, vector=embed(chunk))
graph.insert_node(node)
if prev is not None:
graph.insert_edge(Edge(source_id=prev.id, target_id=node.id, relation="followed_by"))
prev = node
Inspect before asking
result = graph.retrieve(query=q, query_vector=qv, max_hops=2, max_tokens=4000)
print(result.context_text) # See exactly what the LLM would receive
# Tune max_hops / max_tokens / search_limit before wiring ask()
Gotchas
vector_lengthanddistance_metricare locked at init. Swapping embedding models means a new database. Valid metrics:"l2","cosine","inner_product".- Missing vectors are silently zero-filled in
insert_node—bulk_insert_nodesskips nodes whose vectors don't matchvector_lengthand logs a warning. Validate upstream if correctness matters. - HNSW is auto-built on init (
auto_index=True). Tune viahnsw_ef_construction,hnsw_ef_search,hnsw_m. Callgraph.compact_index()after heavy deletes to reclaim space (also called automatically bydelete_node). - FTS index is lazy — first
search_nodes/hybrid_searchcall after writes rebuilds it. Expect first-search latency. Force a rebuild withgraph.reindex()if you want it warm before traffic. - Edge dedup key is
(source_id, target_id, relation). Relations are normalized (lowercased, underscored) before comparison —"Reports To"and"reports_to"collide. Edge properties are NOT part of the key. delete_nodecascades edges in both directions (as source AND as target). No orphan-edge safety net.merge_nodestrategies —UPDATEshallow-merges dicts (incoming wins on collision),REPLACEoverwrites wholesale,KEEPonly inserts if new. Pick intentionally.similarity_threshold=1.0is exact match (the default). Lower it to enable Jaro-Winkler fuzzy matching on string properties. Non-string properties always use JSON equality.match_type=True(default) requires sametypefor merge. SetFalseto merge across types — rarely what you want.resolve_duplicatesis O(n²)-ish in fuzzy mode. For large graphs, narrow withmatch_typeand a tightvector_thresholdfirst.extractionandalgorithmsare optional extras. Wrap imports in try/except or checkpip showbefore recommending code that depends on them.- Single-writer DuckDB. Connection pooling and
@with_retry(exponential backoff on transient IO errors) are built in, but don't open the same file from multiple processes for concurrent writes. cursor()returns independent cursors for concurrent reads; the main connection is RLock-guarded for writes.ask()withllm_callable=Nonereturns retrieval only — no generation. Always use this first to validate context before paying for LLM calls.
Data models
| Model | Key fields |
|---|---|
Node | id: UUID, type: str | None, properties: dict, vector: list[float] |
Edge | id, source_id, target_id, relation: str, weight: float | None |
SearchResult | node, score (higher = better for both BM25 and hybrid) |
NearestNode | node, distance (lower = closer) |
TraversalResult | node, depth, path: list[UUID] |
RetrievalContext | node, relationships: list[dict], hop_distance: int |
RetrievalResult | query, contexts, context_text, token_estimate, seed_node_count, total_node_count |
MergeResult | node, created: bool (True = inserted, False = updated) |
EdgeMergeResult | edge, created: bool |
DuplicateCluster | survivor: Node, merged: list[Node] |
All models are Pydantic. IDs auto-generate as UUIDs.
Examples in the repo
examples/openai_example.py— OpenAI embeddings, similarity search, attribute lookupexamples/lexical_graph.py— chunked Wikipedia text with SentenceTransformer, sequentialfollowed_byedgesexamples/dspy_example_typed_pred.py— DSPy typed-predictor extractionexamples/test_ingest.py— parallel extraction (50 workers, 0.88 threshold) + post-passresolve_duplicatesat 0.90
Read examples/test_ingest.py before building a real ingest pipeline — it's the template.
Testing
python3 -m pytest tests/tests.py -v
296 tests cover the public API. Run them when modifying the library.