Files
kg-setup/docs/design.md
Pavel Malkin f2c2ef54e4 initial: design + plan for kg-setup skill
Port the design spec and 17-task implementation plan from arb-scanner
(where the idea was born) to this dedicated repo. Paths in the plan
adjusted to treat this repo root as the skill root (no skills/kg-setup/
subdirectory). Implementation follows via subagent-driven-development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 03:46:38 +03:00

18 KiB
Raw Permalink Blame History

kg-setup — Knowledge Graph bootstrap skill

Date: 2026-04-22 Status: Design approved via brainstorming, awaiting user review before plan. Author/User: pavelmalkin Skill location: ~/.claude/skills/kg-setup/

1. Goal

Build a Claude Code skill that bootstraps a 4-layer project memory system in any project on this Mac by a single natural-language trigger. Primary outcome: Claude retains important project context across sessions without re-reading files, and token usage on routine tasks drops measurably.

Not goals:

  • Replace the user's existing auto-memory system. The skill integrates with it, adding a pointer entry.
  • Ingest chat history or session transcripts into Obsidian (that's a separate tool).
  • Work on remote hosts (ClowH1 prod). Local macOS only for v1.
  • Ship a published package (no pypi/brew/homebrew-tap). Skill lives in ~/.claude/skills/.

2. Success criteria

  1. On a clean project, running the skill once produces: working codegraph+gitnexus MCP servers visible to Claude Code, .codegraph/ index, optional .gitnexus/ index, updated CLAUDE.md with a ## Knowledge Graph section, an Obsidian {repo_name}/_index.md entry, and a pointer line in auto-memory MEMORY.md.
  2. Re-running on the same project is safe: by default only runs health check and reports; destructive actions are gated behind explicit user confirmation (choice "c" from design).
  3. On a project without some prerequisites (e.g. no Obsidian vault), the skill degrades gracefully: skips that layer with a warning, still completes everything else.
  4. After setup, Claude answers 3 smoke-test architectural questions on arb-scanner using the graph MCP tools instead of reading files. Token usage for those 3 questions is lower than pre-skill baseline (rough measurement, not a hard SLA).

3. Architecture

3.1 Four memory layers

Layer What it holds Where Update cadence Skill's role
Code graph AST-derived deps, callers/callees, call chains .codegraph/codegraph.db + .gitnexus/ On-demand refresh Install tools + run init
Project CLAUDE.md Conventions, commands, gotchas ./CLAUDE.md Rarely, by human Add ## Knowledge Graph section (idempotent)
Obsidian vault Decisions, session logs, research, status {VaultRoot}/{repo_name}/ Frequently, by human Create _index.md pointer file; optional sessions/+knowledge/ subfolders
auto-memory Persistent facts between Claude sessions ~/.claude/projects/{slug}/memory/ Automatically by Claude Append one pointer line to MEMORY.md

The skill builds infrastructure. Content is the user's responsibility, except a minimal seed _index.md in Obsidian.

3.2 Skill structure

~/.claude/skills/kg-setup/
├── SKILL.md                          # Orchestrator (~150 lines)
├── scripts/
│   ├── check_prereqs.sh              # Detect installed tools, emit JSON
│   ├── install_tools.sh              # npm install both tools (idempotent)
│   ├── detect_project.py             # git remote, LOC, language, vault path
│   ├── register_mcp.sh               # claude mcp add for both servers
│   ├── init_codegraph.sh             # codegraph init -i in cwd
│   ├── init_gitnexus.sh              # gitnexus analyze . in cwd
│   ├── merge_claude_md.py            # idempotent section merge
│   ├── build_obsidian_index.py       # generate _index.md content
│   ├── update_memory_index.py        # append pointer to MEMORY.md
│   ├── health_check.sh               # MCP ping + test query to each graph
│   └── state.py                      # read/write .kg-setup-state.json
├── templates/
│   ├── claude_md_section.md          # "## Knowledge Graph" template
│   ├── obsidian_index_minimal.md     # Default _index.md
│   └── obsidian_index_rich.md        # --rich / auto-rich variant
├── tests/
│   ├── test_merge_claude_md.py
│   ├── test_detect_project.py
│   ├── test_state.py
│   └── integration_test.sh
└── README.md                         # Human-facing docs for the skill itself

4. Components

4.1 SKILL.md

Frontmatter:

---
name: kg-setup
description: Bootstrap a 4-layer project memory system (CodeGraph + GitNexus + CLAUDE.md + Obsidian + auto-memory). Use when user asks to "настрой граф знаний", "подключи базу знаний", "запомни проект", "setup knowledge graph", "bootstrap project memory", "initialize code graph", "пусть ты помнишь детали проекта".
---

Body — 5 phases in explicit markdown:

  1. DETECT — invoke check_prereqs.sh + detect_project.py, read ./CLAUDE.md and ./.kg-setup-state.json. Aggregate into an in-memory state blob.
  2. PLAN — Claude reads state, constructs action list. If state file says "healthy" and no --refresh flag, skip directly to VERIFY.
  3. EXECUTE — run actions in order: install tools → register MCP → init graphs → merge CLAUDE.md → Obsidian → auto-memory. Each step is atomic; a failure after step N does not invalidate steps 1..N-1.
  4. VERIFYhealth_check.sh. Writes summary to state file.
  5. REPORT — Claude prints bulleted summary to user: what succeeded, what was skipped, what failed, suggested next steps.

Rerun behavior (decision point — user chose "c", ask):

  • If state file exists and shows previous successful setup → Claude asks: "Проект уже настроен. (r) health-check only / (f) full refresh / (s) skip Obsidian only / (q) cancel?"
  • Default to (r) if no answer after prompt.

4.2 check_prereqs.sh

Output schema (JSON to stdout):

{
  "schema_version": 1,
  "env": {
    "node": "v20.11.0",
    "node_major": 20,
    "python": "3.12.1",
    "git": "2.43.0",
    "npm": "10.2.4"
  },
  "tools": {
    "gitnexus_cli": {"installed": false, "version": null, "path": null},
    "codegraph_cli": {"installed": true, "version": "0.x.y", "path": "/opt/homebrew/bin/codegraph"},
    "gitnexus_mcp_registered": false,
    "codegraph_mcp_registered": true
  },
  "obsidian": {
    "mcp_available": true,
    "vault_path_hint": null
  },
  "errors": [],
  "warnings": []
}

Node 18+ is the hard prerequisite (CodeGraph requirement). If node_major < 18 → blocking error.

4.3 install_tools.sh

Concrete commands (from README reality-check):

npm install -g gitnexus
npm install -g @colbymchenry/codegraph

Idempotent: if which codegraph already exits 0, skip; same for gitnexus. Failures → retry once with --force flag, then bubble up.

4.4 register_mcp.sh

Uses Claude Code CLI:

claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp

Check via claude mcp list before adding. If already registered, skip.

4.5 detect_project.py

Detects:

  • git_remote_name: basename(git remote get-url origin, .git). If no remote → fallback to basename(cwd), emit warning.
  • primary_lang: heuristic based on file extensions + common markers (requirements.txt/pyproject.toml → python, package.json → js/ts, etc.)
  • loc: sum of non-blank, non-comment lines across source files, capped at 100k for speed
  • vault_path: determined in this priority order: (1) env var KG_SETUP_VAULT_PATH if set, (2) Obsidian MCP get_vault_stats response, (3) null. Never attempts filesystem scan for vaults. If null, Obsidian layer is skipped gracefully in phase 3.

4.6 init_codegraph.sh and init_gitnexus.sh

# codegraph
codegraph init -i                    # interactive false via env var or preset
# gitnexus
gitnexus analyze .

Both check for existing .codegraph/ / .gitnexus/ first. If present and refresh-mode is not active, skip with note. Refresh-mode is triggered in two ways: (a) user's natural-language intent contains phrases like "обнови граф", "re-index", "refresh graph" — Claude interprets and passes --refresh env var to the scripts; (b) user selects (f) full refresh in the rerun prompt (see SKILL.md rerun behavior in §4.1).

4.7 merge_claude_md.py

Rules:

  1. If ./CLAUDE.md missing → create with our section only + note that user should add their own conventions above.
  2. If present but no section with heading ## Knowledge Graph → append section to end.
  3. If section exists and contains marker <!-- generated:kg-setup-v1 --> → update the generated-content lines (paths, timestamps), preserve anything below <!-- user-content --> marker unchanged.
  4. If section exists without our marker → do not touch; emit warning "CLAUDE.md has a ## Knowledge Graph section from another source, skipping merge."

Template stored in templates/claude_md_section.md:

## Knowledge Graph

<!-- generated:kg-setup-v1 -->
Local graph indices:
- CodeGraph: `.codegraph/codegraph.db` (query via MCP server `codegraph`)
- GitNexus: `.gitnexus/` (query via MCP server `gitnexus`)

Obsidian notes: `{VaultRoot}/{repo_name}/_index.md`
auto-memory: `~/.claude/projects/{slug}/memory/MEMORY.md`

Refresh indices after major code changes:
- `codegraph init -i --refresh`
- `gitnexus analyze . --force`
<!-- /generated -->

<!-- user-content -->
<!-- anything below here is preserved between kg-setup runs -->

Pre-write safety: run git status --porcelain -- CLAUDE.md. If CLAUDE.md has uncommitted changes → refuse to write, ask user to commit or stash first.

4.8 build_obsidian_index.py

Generates _index.md content. Claude then writes it via Obsidian MCP write_note.

Mode selection:

  • rich: explicit --rich flag OR detect_project.loc > 5000 OR existence of ./tests/ and ./docs/ in project
  • minimal: otherwise

Minimal template:

---
tags: [project, kg-index]
---

# {repo_name}

**Repo:** `{project_path}`
**Languages:** {primary_lang}
**LOC:** ~{loc}
**Setup date:** {today}

## Quick links
- [[../arb-scanner/_index]] ← other projects in vault
- Code graph local index: `{project_path}/.codegraph/`
- CLAUDE.md: `{project_path}/CLAUDE.md`

## What lives here
(Add notes on decisions, research, session logs below or in nested folders.)

Rich template adds: sessions/ placeholder, knowledge/decisions/ placeholder, knowledge/patterns/ placeholder — creating empty .gitkeep-style folder notes (_folder.md with a stub header).

4.9 update_memory_index.py

Appends one line to ~/.claude/projects/{slug}/memory/MEMORY.md. {slug} is the project's canonical Claude Code projects-directory name: project path with / replaced by - (e.g. /Users/pavelmalkin/Documents/Scaner-Users-pavelmalkin-Documents-Scaner). If the memory directory does not exist yet, skip this step — the user's auto-memory harness will create it on first interaction, and the skill can be re-run later to add the pointer.

- [Knowledge graph bootstrap for {repo_name}](project_kg_{repo_name}.md) — set up 2026-04-22, graph in .codegraph, vault: {repo_name}/

And writes the pointed-to file project_kg_{repo_name}.md with frontmatter:

---
name: Knowledge graph for {repo_name}
description: Pointer to graph indices, Obsidian vault folder, and CLAUDE.md section for {repo_name}
type: project
---

Project `{repo_name}` ({project_path}) has kg-setup applied on 2026-04-22.
- CodeGraph MCP: server name `codegraph`, query with `mcp__codegraph__*` tools
- GitNexus MCP: server name `gitnexus`
- Obsidian vault folder: `{repo_name}/`
- State file: `{project_path}/.kg-setup-state.json`
- Reindex: `codegraph init -i --refresh` (or `gitnexus analyze . --force`)

Idempotent: checks for existing pointer line before appending.

4.10 health_check.sh

Three checks:

  1. MCP servers visible: claude mcp list | grep -E 'codegraph|gitnexus' — each must return a line.
  2. CodeGraph query: test via mcp__codegraph__codegraph_status tool through Claude (not direct — skill instructs Claude to run this as a test).
  3. GitNexus query: similar smoke test.

Output: JSON summary. Claude converts to user-readable checklist in REPORT phase.

4.11 state.py

.kg-setup-state.json schema (at project root):

{
  "schema_version": 1,
  "skill_version": "0.1.0",
  "last_run": "2026-04-22T13:00:00Z",
  "last_run_status": "healthy|degraded|incomplete",
  "layers": {
    "code_graph": {"configured": true, "tool": "codegraph", "index_path": ".codegraph/"},
    "gitnexus": {"configured": true, "index_path": ".gitnexus/"},
    "claude_md": {"configured": true, "section_marker": "kg-setup-v1"},
    "obsidian": {"configured": true, "vault_folder": "arb-scanner/", "mode": "minimal"},
    "auto_memory": {"configured": true, "pointer_file": "project_kg_arb-scanner.md"}
  },
  "warnings": [],
  "errors": []
}

On successful first run, append .kg-setup-state.json to .gitignore (check it's not already there).

5. Data flow

See brainstorming session output (reproduced):

User trigger
  → Phase 1 DETECT (check_prereqs + detect_project + read state)
  → Phase 2 PLAN (Claude decides action list)
  → Phase 3 EXECUTE:
     one-time: install_tools → register_mcp
     per-project: init_codegraph → init_gitnexus → merge_claude_md
                  → build_obsidian_index → update_memory_index
     → write state.json after each step
  → Phase 4 VERIFY (health_check)
  → Phase 5 REPORT (Claude writes bulleted summary)

6. Error handling

Category Examples Behavior
Blocking No node/python/git; node < 18 Stop. Show install command. No disk writes.
Recoverable npm network flake Retry once. Then escalate to blocking.
Degraded Obsidian vault not found; codegraph refuses a language Skip layer. Write to state.warnings. Continue.
User-conflict CLAUDE.md has ## Knowledge Graph without our marker; CLAUDE.md has uncommitted changes Stop that step. Prompt user.

Atomic writes: tmp-file + rename for every file written. Never leave half-written artifacts. Git-aware: refuse to write CLAUDE.md if user has uncommitted changes in it. Errors/warnings always persisted to .kg-setup-state.json.

User-facing error format:

✗ install_tools (codegraph): npm exited with code 1
  stderr: ...
  → Fix: `brew install node@20` and re-run skill
✓ install_tools (gitnexus): already at v1.2.3, skipped

7. Testing

7.1 Unit tests (pytest)

  • test_merge_claude_md.pyhighest priority. Snapshot-based:
    • no file → creates from template
    • no section → appends
    • section with our marker → updates generated lines only, preserves <!-- user-content --> block
    • section without our marker → untouched, warning emitted
    • 10 consecutive runs → identical output (idempotency)
  • test_detect_project.py — tmp_path scenarios: with/without .git, various package manifests for language detection, LOC counter on synthetic files
  • test_state.py — read/write roundtrip, schema_version migration stubs

7.2 Integration test

tests/integration_test.sh:

  1. Create temp dir, git init, add app.py with a trivial function
  2. Invoke skill phases directly (not through Claude) via the scripts
  3. Assert: .codegraph/ exists, CLAUDE.md exists and contains kg-setup-v1 marker, state file status=healthy
  4. Run again → no changes, exit 0
  5. Cleanup

7.3 Manual smoke test (on arb-scanner itself)

Before/after comparison:

  • Pre: ask Claude 3 architectural questions about arb-scanner, record token usage
  • Run skill
  • Post: ask same 3 questions, confirm Claude uses mcp__codegraph__* tools instead of reading files. Compare tokens.

7.4 Not tested automatically

  • Actual npm install of gitnexus and @colbymchenry/codegraph (external network, unstable)
  • Obsidian MCP writes against a real vault (tested once manually, then relied upon)

8. Install commands reference (verified from README 2026-04-22)

# CLI tools
npm install -g gitnexus
npm install -g @colbymchenry/codegraph

# MCP registration
claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp

# Per-project init
codegraph init -i
gitnexus analyze .

Requirements: Node.js 18+. No brew/pip install path documented in either README.

9. Open decisions recorded here

# Decision User's choice Rationale
1 Implementation approach Approach 2 (skill + bundled scripts) Deterministic, fast, low-token vs Approach 1; not overkill like Approach 3
2 Install both GitNexus and CodeGraph Both User explicitly wanted both; different angles on the same graph
3 Scope local-only Remote (ClowH1) deferred; user will SSH-run scripts manually if needed
4 Rerun behavior Ask user (choice c) Most predictable, avoids silent destructive actions
5 Skill name kg-setup Shorter than setup-knowledge-graph; matches user's existing short names
6 Canonical project name git remote basename Derived from git remote get-url origin, fallback to cwd basename
7 Obsidian folder convention {VaultRoot}/{repo_name}/ Matches existing arb-scanner/ and betting-dashboard/
8 CLAUDE.md merge style α (append section, preserve above) Don't touch user's hand-written 102-line CLAUDE.md
9 Obsidian mode default minimal, rich on flag or LOC > 5000 Avoid clutter on small projects; betting-dashboard is rich → justified
10 Languages tested Python (primary), JS (secondary) arb-scanner is Python; user occasionally uses JS

10. Out of scope for v1

  • Remote/SSH setup for ClowH1 prod (separate tool later)
  • Automated cron-based reindexing
  • Chat history → Obsidian export pipeline
  • Graph visualization UI beyond GitNexus's default web view
  • Publishing the skill to a marketplace
  • Support for languages beyond what CodeGraph/GitNexus already support
  • Windows/Linux (macOS only)

11. Post-setup housekeeping (not the skill's job, but flagged in the report)

  • User should rename /Users/pavelmalkin/Documents/Scaner.../arb-scanner after end of current Claude Code session (to align with git remote name, Obsidian folder, and kg-setup canonical)
  • User can populate arb-scanner/_index.md with richer content over time
  • Ownership of the ## Knowledge Graph section in CLAUDE.md: generated block is replaceable, everything below <!-- user-content --> is the user's