Port the design spec and 17-task implementation plan from arb-scanner (where the idea was born) to this dedicated repo. Paths in the plan adjusted to treat this repo root as the skill root (no skills/kg-setup/ subdirectory). Implementation follows via subagent-driven-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
kg-setup — Knowledge Graph bootstrap skill
Date: 2026-04-22
Status: Design approved via brainstorming, awaiting user review before plan.
Author/User: pavelmalkin
Skill location: ~/.claude/skills/kg-setup/
1. Goal
Build a Claude Code skill that bootstraps a 4-layer project memory system in any project on this Mac by a single natural-language trigger. Primary outcome: Claude retains important project context across sessions without re-reading files, and token usage on routine tasks drops measurably.
Not goals:
- Replace the user's existing auto-memory system. The skill integrates with it, adding a pointer entry.
- Ingest chat history or session transcripts into Obsidian (that's a separate tool).
- Work on remote hosts (ClowH1 prod). Local macOS only for v1.
- Ship a published package (no pypi/brew/homebrew-tap). Skill lives in
~/.claude/skills/.
2. Success criteria
- On a clean project, running the skill once produces: working
codegraph+gitnexusMCP servers visible to Claude Code,.codegraph/index, optional.gitnexus/index, updatedCLAUDE.mdwith a## Knowledge Graphsection, an Obsidian{repo_name}/_index.mdentry, and a pointer line in auto-memoryMEMORY.md. - Re-running on the same project is safe: by default only runs health check and reports; destructive actions are gated behind explicit user confirmation (choice "c" from design).
- On a project without some prerequisites (e.g. no Obsidian vault), the skill degrades gracefully: skips that layer with a warning, still completes everything else.
- After setup, Claude answers 3 smoke-test architectural questions on
arb-scannerusing the graph MCP tools instead of reading files. Token usage for those 3 questions is lower than pre-skill baseline (rough measurement, not a hard SLA).
3. Architecture
3.1 Four memory layers
| Layer | What it holds | Where | Update cadence | Skill's role |
|---|---|---|---|---|
| Code graph | AST-derived deps, callers/callees, call chains | .codegraph/codegraph.db + .gitnexus/ |
On-demand refresh | Install tools + run init |
| Project CLAUDE.md | Conventions, commands, gotchas | ./CLAUDE.md |
Rarely, by human | Add ## Knowledge Graph section (idempotent) |
| Obsidian vault | Decisions, session logs, research, status | {VaultRoot}/{repo_name}/ |
Frequently, by human | Create _index.md pointer file; optional sessions/+knowledge/ subfolders |
| auto-memory | Persistent facts between Claude sessions | ~/.claude/projects/{slug}/memory/ |
Automatically by Claude | Append one pointer line to MEMORY.md |
The skill builds infrastructure. Content is the user's responsibility, except a minimal seed _index.md in Obsidian.
3.2 Skill structure
~/.claude/skills/kg-setup/
├── SKILL.md # Orchestrator (~150 lines)
├── scripts/
│ ├── check_prereqs.sh # Detect installed tools, emit JSON
│ ├── install_tools.sh # npm install both tools (idempotent)
│ ├── detect_project.py # git remote, LOC, language, vault path
│ ├── register_mcp.sh # claude mcp add for both servers
│ ├── init_codegraph.sh # codegraph init -i in cwd
│ ├── init_gitnexus.sh # gitnexus analyze . in cwd
│ ├── merge_claude_md.py # idempotent section merge
│ ├── build_obsidian_index.py # generate _index.md content
│ ├── update_memory_index.py # append pointer to MEMORY.md
│ ├── health_check.sh # MCP ping + test query to each graph
│ └── state.py # read/write .kg-setup-state.json
├── templates/
│ ├── claude_md_section.md # "## Knowledge Graph" template
│ ├── obsidian_index_minimal.md # Default _index.md
│ └── obsidian_index_rich.md # --rich / auto-rich variant
├── tests/
│ ├── test_merge_claude_md.py
│ ├── test_detect_project.py
│ ├── test_state.py
│ └── integration_test.sh
└── README.md # Human-facing docs for the skill itself
4. Components
4.1 SKILL.md
Frontmatter:
---
name: kg-setup
description: Bootstrap a 4-layer project memory system (CodeGraph + GitNexus + CLAUDE.md + Obsidian + auto-memory). Use when user asks to "настрой граф знаний", "подключи базу знаний", "запомни проект", "setup knowledge graph", "bootstrap project memory", "initialize code graph", "пусть ты помнишь детали проекта".
---
Body — 5 phases in explicit markdown:
- DETECT — invoke
check_prereqs.sh+detect_project.py, read./CLAUDE.mdand./.kg-setup-state.json. Aggregate into an in-memorystateblob. - PLAN — Claude reads state, constructs action list. If state file says "healthy" and no
--refreshflag, skip directly to VERIFY. - EXECUTE — run actions in order: install tools → register MCP → init graphs → merge CLAUDE.md → Obsidian → auto-memory. Each step is atomic; a failure after step N does not invalidate steps 1..N-1.
- VERIFY —
health_check.sh. Writes summary to state file. - REPORT — Claude prints bulleted summary to user: what succeeded, what was skipped, what failed, suggested next steps.
Rerun behavior (decision point — user chose "c", ask):
- If state file exists and shows previous successful setup → Claude asks: "Проект уже настроен.
(r)health-check only /(f)full refresh /(s)skip Obsidian only /(q)cancel?" - Default to
(r)if no answer after prompt.
4.2 check_prereqs.sh
Output schema (JSON to stdout):
{
"schema_version": 1,
"env": {
"node": "v20.11.0",
"node_major": 20,
"python": "3.12.1",
"git": "2.43.0",
"npm": "10.2.4"
},
"tools": {
"gitnexus_cli": {"installed": false, "version": null, "path": null},
"codegraph_cli": {"installed": true, "version": "0.x.y", "path": "/opt/homebrew/bin/codegraph"},
"gitnexus_mcp_registered": false,
"codegraph_mcp_registered": true
},
"obsidian": {
"mcp_available": true,
"vault_path_hint": null
},
"errors": [],
"warnings": []
}
Node 18+ is the hard prerequisite (CodeGraph requirement). If node_major < 18 → blocking error.
4.3 install_tools.sh
Concrete commands (from README reality-check):
npm install -g gitnexus
npm install -g @colbymchenry/codegraph
Idempotent: if which codegraph already exits 0, skip; same for gitnexus. Failures → retry once with --force flag, then bubble up.
4.4 register_mcp.sh
Uses Claude Code CLI:
claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp
Check via claude mcp list before adding. If already registered, skip.
4.5 detect_project.py
Detects:
git_remote_name:basename(git remote get-url origin, .git). If no remote → fallback tobasename(cwd), emit warning.primary_lang: heuristic based on file extensions + common markers (requirements.txt/pyproject.toml→ python,package.json→ js/ts, etc.)loc: sum of non-blank, non-comment lines across source files, capped at 100k for speedvault_path: determined in this priority order: (1) env varKG_SETUP_VAULT_PATHif set, (2) Obsidian MCPget_vault_statsresponse, (3) null. Never attempts filesystem scan for vaults. If null, Obsidian layer is skipped gracefully in phase 3.
4.6 init_codegraph.sh and init_gitnexus.sh
# codegraph
codegraph init -i # interactive false via env var or preset
# gitnexus
gitnexus analyze .
Both check for existing .codegraph/ / .gitnexus/ first. If present and refresh-mode is not active, skip with note. Refresh-mode is triggered in two ways: (a) user's natural-language intent contains phrases like "обнови граф", "re-index", "refresh graph" — Claude interprets and passes --refresh env var to the scripts; (b) user selects (f) full refresh in the rerun prompt (see SKILL.md rerun behavior in §4.1).
4.7 merge_claude_md.py
Rules:
- If
./CLAUDE.mdmissing → create with our section only + note that user should add their own conventions above. - If present but no section with heading
## Knowledge Graph→ append section to end. - If section exists and contains marker
<!-- generated:kg-setup-v1 -->→ update the generated-content lines (paths, timestamps), preserve anything below<!-- user-content -->marker unchanged. - If section exists without our marker → do not touch; emit warning "CLAUDE.md has a
## Knowledge Graphsection from another source, skipping merge."
Template stored in templates/claude_md_section.md:
## Knowledge Graph
<!-- generated:kg-setup-v1 -->
Local graph indices:
- CodeGraph: `.codegraph/codegraph.db` (query via MCP server `codegraph`)
- GitNexus: `.gitnexus/` (query via MCP server `gitnexus`)
Obsidian notes: `{VaultRoot}/{repo_name}/_index.md`
auto-memory: `~/.claude/projects/{slug}/memory/MEMORY.md`
Refresh indices after major code changes:
- `codegraph init -i --refresh`
- `gitnexus analyze . --force`
<!-- /generated -->
<!-- user-content -->
<!-- anything below here is preserved between kg-setup runs -->
Pre-write safety: run git status --porcelain -- CLAUDE.md. If CLAUDE.md has uncommitted changes → refuse to write, ask user to commit or stash first.
4.8 build_obsidian_index.py
Generates _index.md content. Claude then writes it via Obsidian MCP write_note.
Mode selection:
- rich: explicit
--richflag ORdetect_project.loc > 5000OR existence of./tests/and./docs/in project - minimal: otherwise
Minimal template:
---
tags: [project, kg-index]
---
# {repo_name}
**Repo:** `{project_path}`
**Languages:** {primary_lang}
**LOC:** ~{loc}
**Setup date:** {today}
## Quick links
- [[../arb-scanner/_index]] ← other projects in vault
- Code graph local index: `{project_path}/.codegraph/`
- CLAUDE.md: `{project_path}/CLAUDE.md`
## What lives here
(Add notes on decisions, research, session logs below or in nested folders.)
Rich template adds: sessions/ placeholder, knowledge/decisions/ placeholder, knowledge/patterns/ placeholder — creating empty .gitkeep-style folder notes (_folder.md with a stub header).
4.9 update_memory_index.py
Appends one line to ~/.claude/projects/{slug}/memory/MEMORY.md. {slug} is the project's canonical Claude Code projects-directory name: project path with / replaced by - (e.g. /Users/pavelmalkin/Documents/Scaner → -Users-pavelmalkin-Documents-Scaner). If the memory directory does not exist yet, skip this step — the user's auto-memory harness will create it on first interaction, and the skill can be re-run later to add the pointer.
- [Knowledge graph bootstrap for {repo_name}](project_kg_{repo_name}.md) — set up 2026-04-22, graph in .codegraph, vault: {repo_name}/
And writes the pointed-to file project_kg_{repo_name}.md with frontmatter:
---
name: Knowledge graph for {repo_name}
description: Pointer to graph indices, Obsidian vault folder, and CLAUDE.md section for {repo_name}
type: project
---
Project `{repo_name}` ({project_path}) has kg-setup applied on 2026-04-22.
- CodeGraph MCP: server name `codegraph`, query with `mcp__codegraph__*` tools
- GitNexus MCP: server name `gitnexus`
- Obsidian vault folder: `{repo_name}/`
- State file: `{project_path}/.kg-setup-state.json`
- Reindex: `codegraph init -i --refresh` (or `gitnexus analyze . --force`)
Idempotent: checks for existing pointer line before appending.
4.10 health_check.sh
Three checks:
- MCP servers visible:
claude mcp list | grep -E 'codegraph|gitnexus'— each must return a line. - CodeGraph query: test via
mcp__codegraph__codegraph_statustool through Claude (not direct — skill instructs Claude to run this as a test). - GitNexus query: similar smoke test.
Output: JSON summary. Claude converts to user-readable checklist in REPORT phase.
4.11 state.py
.kg-setup-state.json schema (at project root):
{
"schema_version": 1,
"skill_version": "0.1.0",
"last_run": "2026-04-22T13:00:00Z",
"last_run_status": "healthy|degraded|incomplete",
"layers": {
"code_graph": {"configured": true, "tool": "codegraph", "index_path": ".codegraph/"},
"gitnexus": {"configured": true, "index_path": ".gitnexus/"},
"claude_md": {"configured": true, "section_marker": "kg-setup-v1"},
"obsidian": {"configured": true, "vault_folder": "arb-scanner/", "mode": "minimal"},
"auto_memory": {"configured": true, "pointer_file": "project_kg_arb-scanner.md"}
},
"warnings": [],
"errors": []
}
On successful first run, append .kg-setup-state.json to .gitignore (check it's not already there).
5. Data flow
See brainstorming session output (reproduced):
User trigger
→ Phase 1 DETECT (check_prereqs + detect_project + read state)
→ Phase 2 PLAN (Claude decides action list)
→ Phase 3 EXECUTE:
one-time: install_tools → register_mcp
per-project: init_codegraph → init_gitnexus → merge_claude_md
→ build_obsidian_index → update_memory_index
→ write state.json after each step
→ Phase 4 VERIFY (health_check)
→ Phase 5 REPORT (Claude writes bulleted summary)
6. Error handling
| Category | Examples | Behavior |
|---|---|---|
| Blocking | No node/python/git; node < 18 | Stop. Show install command. No disk writes. |
| Recoverable | npm network flake | Retry once. Then escalate to blocking. |
| Degraded | Obsidian vault not found; codegraph refuses a language | Skip layer. Write to state.warnings. Continue. |
| User-conflict | CLAUDE.md has ## Knowledge Graph without our marker; CLAUDE.md has uncommitted changes |
Stop that step. Prompt user. |
Atomic writes: tmp-file + rename for every file written. Never leave half-written artifacts.
Git-aware: refuse to write CLAUDE.md if user has uncommitted changes in it.
Errors/warnings always persisted to .kg-setup-state.json.
User-facing error format:
✗ install_tools (codegraph): npm exited with code 1
stderr: ...
→ Fix: `brew install node@20` and re-run skill
✓ install_tools (gitnexus): already at v1.2.3, skipped
7. Testing
7.1 Unit tests (pytest)
test_merge_claude_md.py— highest priority. Snapshot-based:- no file → creates from template
- no section → appends
- section with our marker → updates generated lines only, preserves
<!-- user-content -->block - section without our marker → untouched, warning emitted
- 10 consecutive runs → identical output (idempotency)
test_detect_project.py— tmp_path scenarios: with/without.git, various package manifests for language detection, LOC counter on synthetic filestest_state.py— read/write roundtrip, schema_version migration stubs
7.2 Integration test
tests/integration_test.sh:
- Create temp dir,
git init, addapp.pywith a trivial function - Invoke skill phases directly (not through Claude) via the scripts
- Assert:
.codegraph/exists,CLAUDE.mdexists and containskg-setup-v1marker, state filestatus=healthy - Run again → no changes, exit 0
- Cleanup
7.3 Manual smoke test (on arb-scanner itself)
Before/after comparison:
- Pre: ask Claude 3 architectural questions about arb-scanner, record token usage
- Run skill
- Post: ask same 3 questions, confirm Claude uses
mcp__codegraph__*tools instead of reading files. Compare tokens.
7.4 Not tested automatically
- Actual
npm installofgitnexusand@colbymchenry/codegraph(external network, unstable) - Obsidian MCP writes against a real vault (tested once manually, then relied upon)
8. Install commands reference (verified from README 2026-04-22)
# CLI tools
npm install -g gitnexus
npm install -g @colbymchenry/codegraph
# MCP registration
claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp
# Per-project init
codegraph init -i
gitnexus analyze .
Requirements: Node.js 18+. No brew/pip install path documented in either README.
9. Open decisions recorded here
| # | Decision | User's choice | Rationale |
|---|---|---|---|
| 1 | Implementation approach | Approach 2 (skill + bundled scripts) | Deterministic, fast, low-token vs Approach 1; not overkill like Approach 3 |
| 2 | Install both GitNexus and CodeGraph | Both | User explicitly wanted both; different angles on the same graph |
| 3 | Scope | local-only | Remote (ClowH1) deferred; user will SSH-run scripts manually if needed |
| 4 | Rerun behavior | Ask user (choice c) | Most predictable, avoids silent destructive actions |
| 5 | Skill name | kg-setup |
Shorter than setup-knowledge-graph; matches user's existing short names |
| 6 | Canonical project name | git remote basename | Derived from git remote get-url origin, fallback to cwd basename |
| 7 | Obsidian folder convention | {VaultRoot}/{repo_name}/ |
Matches existing arb-scanner/ and betting-dashboard/ |
| 8 | CLAUDE.md merge style | α (append section, preserve above) | Don't touch user's hand-written 102-line CLAUDE.md |
| 9 | Obsidian mode default | minimal, rich on flag or LOC > 5000 | Avoid clutter on small projects; betting-dashboard is rich → justified |
| 10 | Languages tested | Python (primary), JS (secondary) | arb-scanner is Python; user occasionally uses JS |
10. Out of scope for v1
- Remote/SSH setup for ClowH1 prod (separate tool later)
- Automated cron-based reindexing
- Chat history → Obsidian export pipeline
- Graph visualization UI beyond GitNexus's default web view
- Publishing the skill to a marketplace
- Support for languages beyond what CodeGraph/GitNexus already support
- Windows/Linux (macOS only)
11. Post-setup housekeeping (not the skill's job, but flagged in the report)
- User should rename
/Users/pavelmalkin/Documents/Scaner→.../arb-scannerafter end of current Claude Code session (to align with git remote name, Obsidian folder, and kg-setup canonical) - User can populate
arb-scanner/_index.mdwith richer content over time - Ownership of the
## Knowledge Graphsection in CLAUDE.md: generated block is replaceable, everything below<!-- user-content -->is the user's