initial: design + plan for kg-setup skill

Port the design spec and 17-task implementation plan from arb-scanner (where the idea was born) to this dedicated repo. Paths in the plan adjusted to treat this repo root as the skill root (no skills/kg-setup/ subdirectory). Implementation follows via subagent-driven-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 03:46:38 +03:00
commit f2c2ef54e4
4 changed files with 2567 additions and 0 deletions
--- a/docs/design.md
+++ b/docs/design.md
@@ -0,0 +1,399 @@
+# kg-setup — Knowledge Graph bootstrap skill
+
+**Date:** 2026-04-22
+**Status:** Design approved via brainstorming, awaiting user review before plan.
+**Author/User:** pavelmalkin
+**Skill location:** `~/.claude/skills/kg-setup/`
+
+## 1. Goal
+
+Build a Claude Code skill that bootstraps a 4-layer project memory system in any project on this Mac by a single natural-language trigger. Primary outcome: Claude retains important project context across sessions without re-reading files, and token usage on routine tasks drops measurably.
+
+**Not goals:**
+- Replace the user's existing auto-memory system. The skill *integrates* with it, adding a pointer entry.
+- Ingest chat history or session transcripts into Obsidian (that's a separate tool).
+- Work on remote hosts (ClowH1 prod). Local macOS only for v1.
+- Ship a published package (no pypi/brew/homebrew-tap). Skill lives in `~/.claude/skills/`.
+
+## 2. Success criteria
+
+1. On a clean project, running the skill once produces: working `codegraph`+`gitnexus` MCP servers visible to Claude Code, `.codegraph/` index, optional `.gitnexus/` index, updated `CLAUDE.md` with a `## Knowledge Graph` section, an Obsidian `{repo_name}/_index.md` entry, and a pointer line in auto-memory `MEMORY.md`.
+2. Re-running on the same project is safe: by default only runs health check and reports; destructive actions are gated behind explicit user confirmation (choice "c" from design).
+3. On a project without some prerequisites (e.g. no Obsidian vault), the skill degrades gracefully: skips that layer with a warning, still completes everything else.
+4. After setup, Claude answers 3 smoke-test architectural questions on `arb-scanner` using the graph MCP tools instead of reading files. Token usage for those 3 questions is lower than pre-skill baseline (rough measurement, not a hard SLA).
+
+## 3. Architecture
+
+### 3.1 Four memory layers
+
+| Layer | What it holds | Where | Update cadence | Skill's role |
+|---|---|---|---|---|
+| **Code graph** | AST-derived deps, callers/callees, call chains | `.codegraph/codegraph.db` + `.gitnexus/` | On-demand refresh | Install tools + run init |
+| **Project CLAUDE.md** | Conventions, commands, gotchas | `./CLAUDE.md` | Rarely, by human | Add `## Knowledge Graph` section (idempotent) |
+| **Obsidian vault** | Decisions, session logs, research, status | `{VaultRoot}/{repo_name}/` | Frequently, by human | Create `_index.md` pointer file; optional `sessions/`+`knowledge/` subfolders |
+| **auto-memory** | Persistent facts between Claude sessions | `~/.claude/projects/{slug}/memory/` | Automatically by Claude | Append one pointer line to `MEMORY.md` |
+
+The skill builds **infrastructure**. Content is the user's responsibility, except a minimal seed `_index.md` in Obsidian.
+
+### 3.2 Skill structure
+
+```
+~/.claude/skills/kg-setup/
+├── SKILL.md                          # Orchestrator (~150 lines)
+├── scripts/
+│   ├── check_prereqs.sh              # Detect installed tools, emit JSON
+│   ├── install_tools.sh              # npm install both tools (idempotent)
+│   ├── detect_project.py             # git remote, LOC, language, vault path
+│   ├── register_mcp.sh               # claude mcp add for both servers
+│   ├── init_codegraph.sh             # codegraph init -i in cwd
+│   ├── init_gitnexus.sh              # gitnexus analyze . in cwd
+│   ├── merge_claude_md.py            # idempotent section merge
+│   ├── build_obsidian_index.py       # generate _index.md content
+│   ├── update_memory_index.py        # append pointer to MEMORY.md
+│   ├── health_check.sh               # MCP ping + test query to each graph
+│   └── state.py                      # read/write .kg-setup-state.json
+├── templates/
+│   ├── claude_md_section.md          # "## Knowledge Graph" template
+│   ├── obsidian_index_minimal.md     # Default _index.md
+│   └── obsidian_index_rich.md        # --rich / auto-rich variant
+├── tests/
+│   ├── test_merge_claude_md.py
+│   ├── test_detect_project.py
+│   ├── test_state.py
+│   └── integration_test.sh
+└── README.md                         # Human-facing docs for the skill itself
+```
+
+## 4. Components
+
+### 4.1 `SKILL.md`
+
+Frontmatter:
+```yaml
+---
+name: kg-setup
+description: Bootstrap a 4-layer project memory system (CodeGraph + GitNexus + CLAUDE.md + Obsidian + auto-memory). Use when user asks to "настрой граф знаний", "подключи базу знаний", "запомни проект", "setup knowledge graph", "bootstrap project memory", "initialize code graph", "пусть ты помнишь детали проекта".
+---
+```
+
+Body — 5 phases in explicit markdown:
+
+1. **DETECT** — invoke `check_prereqs.sh` + `detect_project.py`, read `./CLAUDE.md` and `./.kg-setup-state.json`. Aggregate into an in-memory `state` blob.
+2. **PLAN** — Claude reads state, constructs action list. If state file says "healthy" and no `--refresh` flag, skip directly to VERIFY.
+3. **EXECUTE** — run actions in order: install tools → register MCP → init graphs → merge CLAUDE.md → Obsidian → auto-memory. Each step is atomic; a failure after step N does not invalidate steps 1..N-1.
+4. **VERIFY** — `health_check.sh`. Writes summary to state file.
+5. **REPORT** — Claude prints bulleted summary to user: what succeeded, what was skipped, what failed, suggested next steps.
+
+Rerun behavior (decision point — user chose "c", ask):
+- If state file exists and shows previous successful setup → Claude asks: "Проект уже настроен. `(r)` health-check only / `(f)` full refresh / `(s)` skip Obsidian only / `(q)` cancel?"
+- Default to `(r)` if no answer after prompt.
+
+### 4.2 `check_prereqs.sh`
+
+Output schema (JSON to stdout):
+```json
+{
+  "schema_version": 1,
+  "env": {
+    "node": "v20.11.0",
+    "node_major": 20,
+    "python": "3.12.1",
+    "git": "2.43.0",
+    "npm": "10.2.4"
+  },
+  "tools": {
+    "gitnexus_cli": {"installed": false, "version": null, "path": null},
+    "codegraph_cli": {"installed": true, "version": "0.x.y", "path": "/opt/homebrew/bin/codegraph"},
+    "gitnexus_mcp_registered": false,
+    "codegraph_mcp_registered": true
+  },
+  "obsidian": {
+    "mcp_available": true,
+    "vault_path_hint": null
+  },
+  "errors": [],
+  "warnings": []
+}
+```
+
+Node 18+ is the hard prerequisite (CodeGraph requirement). If `node_major < 18` → blocking error.
+
+### 4.3 `install_tools.sh`
+
+Concrete commands (from README reality-check):
+```bash
+npm install -g gitnexus
+npm install -g @colbymchenry/codegraph
+```
+
+Idempotent: if `which codegraph` already exits 0, skip; same for gitnexus. Failures → retry once with `--force` flag, then bubble up.
+
+### 4.4 `register_mcp.sh`
+
+Uses Claude Code CLI:
+```bash
+claude mcp add gitnexus -- npx -y gitnexus@latest mcp
+claude mcp add codegraph -- codegraph serve --mcp
+```
+
+Check via `claude mcp list` before adding. If already registered, skip.
+
+### 4.5 `detect_project.py`
+
+Detects:
+- `git_remote_name`: `basename(git remote get-url origin, .git)`. If no remote → fallback to `basename(cwd)`, emit warning.
+- `primary_lang`: heuristic based on file extensions + common markers (`requirements.txt`/`pyproject.toml` → python, `package.json` → js/ts, etc.)
+- `loc`: sum of non-blank, non-comment lines across source files, capped at 100k for speed
+- `vault_path`: determined in this priority order: (1) env var `KG_SETUP_VAULT_PATH` if set, (2) Obsidian MCP `get_vault_stats` response, (3) null. Never attempts filesystem scan for vaults. If null, Obsidian layer is skipped gracefully in phase 3.
+
+### 4.6 `init_codegraph.sh` and `init_gitnexus.sh`
+
+```bash
+# codegraph
+codegraph init -i                    # interactive false via env var or preset
+# gitnexus
+gitnexus analyze .
+```
+
+Both check for existing `.codegraph/` / `.gitnexus/` first. If present and refresh-mode is not active, skip with note. Refresh-mode is triggered in two ways: (a) user's natural-language intent contains phrases like "обнови граф", "re-index", "refresh graph" — Claude interprets and passes `--refresh` env var to the scripts; (b) user selects `(f) full refresh` in the rerun prompt (see SKILL.md rerun behavior in §4.1).
+
+### 4.7 `merge_claude_md.py`
+
+Rules:
+1. If `./CLAUDE.md` missing → create with our section only + note that user should add their own conventions above.
+2. If present but no section with heading `## Knowledge Graph` → append section to end.
+3. If section exists and contains marker `<!-- generated:kg-setup-v1 -->` → update the generated-content lines (paths, timestamps), preserve anything below `<!-- user-content -->` marker unchanged.
+4. If section exists without our marker → do not touch; emit warning "CLAUDE.md has a `## Knowledge Graph` section from another source, skipping merge."
+
+Template stored in `templates/claude_md_section.md`:
+```markdown
+## Knowledge Graph
+
+<!-- generated:kg-setup-v1 -->
+Local graph indices:
+- CodeGraph: `.codegraph/codegraph.db` (query via MCP server `codegraph`)
+- GitNexus: `.gitnexus/` (query via MCP server `gitnexus`)
+
+Obsidian notes: `{VaultRoot}/{repo_name}/_index.md`
+auto-memory: `~/.claude/projects/{slug}/memory/MEMORY.md`
+
+Refresh indices after major code changes:
+- `codegraph init -i --refresh`
+- `gitnexus analyze . --force`
+<!-- /generated -->
+
+<!-- user-content -->
+<!-- anything below here is preserved between kg-setup runs -->
+```
+
+Pre-write safety: run `git status --porcelain -- CLAUDE.md`. If CLAUDE.md has uncommitted changes → refuse to write, ask user to commit or stash first.
+
+### 4.8 `build_obsidian_index.py`
+
+Generates `_index.md` content. Claude then writes it via Obsidian MCP `write_note`.
+
+Mode selection:
+- **rich**: explicit `--rich` flag OR `detect_project.loc > 5000` OR existence of `./tests/` and `./docs/` in project
+- **minimal**: otherwise
+
+Minimal template:
+```markdown
+---
+tags: [project, kg-index]
+---
+
+# {repo_name}
+
+**Repo:** `{project_path}`
+**Languages:** {primary_lang}
+**LOC:** ~{loc}
+**Setup date:** {today}
+
+## Quick links
+- [[../arb-scanner/_index]] ← other projects in vault
+- Code graph local index: `{project_path}/.codegraph/`
+- CLAUDE.md: `{project_path}/CLAUDE.md`
+
+## What lives here
+(Add notes on decisions, research, session logs below or in nested folders.)
+```
+
+Rich template adds: `sessions/` placeholder, `knowledge/decisions/` placeholder, `knowledge/patterns/` placeholder — creating empty `.gitkeep`-style folder notes (`_folder.md` with a stub header).
+
+### 4.9 `update_memory_index.py`
+
+Appends one line to `~/.claude/projects/{slug}/memory/MEMORY.md`. `{slug}` is the project's canonical Claude Code projects-directory name: project path with `/` replaced by `-` (e.g. `/Users/pavelmalkin/Documents/Scaner` → `-Users-pavelmalkin-Documents-Scaner`). If the memory directory does not exist yet, skip this step — the user's auto-memory harness will create it on first interaction, and the skill can be re-run later to add the pointer.
+```
+- [Knowledge graph bootstrap for {repo_name}](project_kg_{repo_name}.md) — set up 2026-04-22, graph in .codegraph, vault: {repo_name}/
+```
+
+And writes the pointed-to file `project_kg_{repo_name}.md` with frontmatter:
+```markdown
+---
+name: Knowledge graph for {repo_name}
+description: Pointer to graph indices, Obsidian vault folder, and CLAUDE.md section for {repo_name}
+type: project
+---
+
+Project `{repo_name}` ({project_path}) has kg-setup applied on 2026-04-22.
+- CodeGraph MCP: server name `codegraph`, query with `mcp__codegraph__*` tools
+- GitNexus MCP: server name `gitnexus`
+- Obsidian vault folder: `{repo_name}/`
+- State file: `{project_path}/.kg-setup-state.json`
+- Reindex: `codegraph init -i --refresh` (or `gitnexus analyze . --force`)
+```
+
+Idempotent: checks for existing pointer line before appending.
+
+### 4.10 `health_check.sh`
+
+Three checks:
+1. MCP servers visible: `claude mcp list | grep -E 'codegraph|gitnexus'` — each must return a line.
+2. CodeGraph query: test via `mcp__codegraph__codegraph_status` tool through Claude (not direct — skill instructs Claude to run this as a test).
+3. GitNexus query: similar smoke test.
+
+Output: JSON summary. Claude converts to user-readable checklist in REPORT phase.
+
+### 4.11 `state.py`
+
+`.kg-setup-state.json` schema (at project root):
+```json
+{
+  "schema_version": 1,
+  "skill_version": "0.1.0",
+  "last_run": "2026-04-22T13:00:00Z",
+  "last_run_status": "healthy|degraded|incomplete",
+  "layers": {
+    "code_graph": {"configured": true, "tool": "codegraph", "index_path": ".codegraph/"},
+    "gitnexus": {"configured": true, "index_path": ".gitnexus/"},
+    "claude_md": {"configured": true, "section_marker": "kg-setup-v1"},
+    "obsidian": {"configured": true, "vault_folder": "arb-scanner/", "mode": "minimal"},
+    "auto_memory": {"configured": true, "pointer_file": "project_kg_arb-scanner.md"}
+  },
+  "warnings": [],
+  "errors": []
+}
+```
+
+On successful first run, append `.kg-setup-state.json` to `.gitignore` (check it's not already there).
+
+## 5. Data flow
+
+See brainstorming session output (reproduced):
+
+```
+User trigger
+  → Phase 1 DETECT (check_prereqs + detect_project + read state)
+  → Phase 2 PLAN (Claude decides action list)
+  → Phase 3 EXECUTE:
+     one-time: install_tools → register_mcp
+     per-project: init_codegraph → init_gitnexus → merge_claude_md
+                  → build_obsidian_index → update_memory_index
+     → write state.json after each step
+  → Phase 4 VERIFY (health_check)
+  → Phase 5 REPORT (Claude writes bulleted summary)
+```
+
+## 6. Error handling
+
+| Category | Examples | Behavior |
+|---|---|---|
+| Blocking | No node/python/git; node < 18 | Stop. Show install command. No disk writes. |
+| Recoverable | npm network flake | Retry once. Then escalate to blocking. |
+| Degraded | Obsidian vault not found; codegraph refuses a language | Skip layer. Write to `state.warnings`. Continue. |
+| User-conflict | CLAUDE.md has `## Knowledge Graph` without our marker; CLAUDE.md has uncommitted changes | Stop that step. Prompt user. |
+
+Atomic writes: tmp-file + rename for every file written. Never leave half-written artifacts.
+Git-aware: refuse to write `CLAUDE.md` if user has uncommitted changes in it.
+Errors/warnings always persisted to `.kg-setup-state.json`.
+
+User-facing error format:
+```
+✗ install_tools (codegraph): npm exited with code 1
+  stderr: ...
+  → Fix: `brew install node@20` and re-run skill
+✓ install_tools (gitnexus): already at v1.2.3, skipped
+```
+
+## 7. Testing
+
+### 7.1 Unit tests (pytest)
+
+- `test_merge_claude_md.py` — **highest priority**. Snapshot-based:
+  - no file → creates from template
+  - no section → appends
+  - section with our marker → updates generated lines only, preserves `<!-- user-content -->` block
+  - section without our marker → untouched, warning emitted
+  - 10 consecutive runs → identical output (idempotency)
+- `test_detect_project.py` — tmp_path scenarios: with/without `.git`, various package manifests for language detection, LOC counter on synthetic files
+- `test_state.py` — read/write roundtrip, schema_version migration stubs
+
+### 7.2 Integration test
+
+`tests/integration_test.sh`:
+1. Create temp dir, `git init`, add `app.py` with a trivial function
+2. Invoke skill phases directly (not through Claude) via the scripts
+3. Assert: `.codegraph/` exists, `CLAUDE.md` exists and contains `kg-setup-v1` marker, state file `status=healthy`
+4. Run again → no changes, exit 0
+5. Cleanup
+
+### 7.3 Manual smoke test (on arb-scanner itself)
+
+Before/after comparison:
+- Pre: ask Claude 3 architectural questions about arb-scanner, record token usage
+- Run skill
+- Post: ask same 3 questions, confirm Claude uses `mcp__codegraph__*` tools instead of reading files. Compare tokens.
+
+### 7.4 Not tested automatically
+
+- Actual `npm install` of `gitnexus` and `@colbymchenry/codegraph` (external network, unstable)
+- Obsidian MCP writes against a real vault (tested once manually, then relied upon)
+
+## 8. Install commands reference (verified from README 2026-04-22)
+
+```bash
+# CLI tools
+npm install -g gitnexus
+npm install -g @colbymchenry/codegraph
+
+# MCP registration
+claude mcp add gitnexus -- npx -y gitnexus@latest mcp
+claude mcp add codegraph -- codegraph serve --mcp
+
+# Per-project init
+codegraph init -i
+gitnexus analyze .
+```
+
+Requirements: Node.js 18+. No brew/pip install path documented in either README.
+
+## 9. Open decisions recorded here
+
+| # | Decision | User's choice | Rationale |
+|---|---|---|---|
+| 1 | Implementation approach | Approach 2 (skill + bundled scripts) | Deterministic, fast, low-token vs Approach 1; not overkill like Approach 3 |
+| 2 | Install both GitNexus and CodeGraph | Both | User explicitly wanted both; different angles on the same graph |
+| 3 | Scope | local-only | Remote (ClowH1) deferred; user will SSH-run scripts manually if needed |
+| 4 | Rerun behavior | Ask user (choice c) | Most predictable, avoids silent destructive actions |
+| 5 | Skill name | `kg-setup` | Shorter than `setup-knowledge-graph`; matches user's existing short names |
+| 6 | Canonical project name | git remote basename | Derived from `git remote get-url origin`, fallback to cwd basename |
+| 7 | Obsidian folder convention | `{VaultRoot}/{repo_name}/` | Matches existing `arb-scanner/` and `betting-dashboard/` |
+| 8 | CLAUDE.md merge style | α (append section, preserve above) | Don't touch user's hand-written 102-line CLAUDE.md |
+| 9 | Obsidian mode default | minimal, rich on flag or LOC > 5000 | Avoid clutter on small projects; betting-dashboard is rich → justified |
+| 10 | Languages tested | Python (primary), JS (secondary) | arb-scanner is Python; user occasionally uses JS |
+
+## 10. Out of scope for v1
+
+- Remote/SSH setup for ClowH1 prod (separate tool later)
+- Automated cron-based reindexing
+- Chat history → Obsidian export pipeline
+- Graph visualization UI beyond GitNexus's default web view
+- Publishing the skill to a marketplace
+- Support for languages beyond what CodeGraph/GitNexus already support
+- Windows/Linux (macOS only)
+
+## 11. Post-setup housekeeping (not the skill's job, but flagged in the report)
+
+- User should rename `/Users/pavelmalkin/Documents/Scaner` → `.../arb-scanner` after end of current Claude Code session (to align with git remote name, Obsidian folder, and kg-setup canonical)
+- User can populate `arb-scanner/_index.md` with richer content over time
+- Ownership of the `## Knowledge Graph` section in CLAUDE.md: generated block is replaceable, everything below `<!-- user-content -->` is the user's
--- a/docs/plan.md
+++ b/docs/plan.md