Files
kg-setup/docs/design.md
Pavel Malkin f2c2ef54e4 initial: design + plan for kg-setup skill
Port the design spec and 17-task implementation plan from arb-scanner
(where the idea was born) to this dedicated repo. Paths in the plan
adjusted to treat this repo root as the skill root (no skills/kg-setup/
subdirectory). Implementation follows via subagent-driven-development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 03:46:38 +03:00

400 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# kg-setup — Knowledge Graph bootstrap skill
**Date:** 2026-04-22
**Status:** Design approved via brainstorming, awaiting user review before plan.
**Author/User:** pavelmalkin
**Skill location:** `~/.claude/skills/kg-setup/`
## 1. Goal
Build a Claude Code skill that bootstraps a 4-layer project memory system in any project on this Mac by a single natural-language trigger. Primary outcome: Claude retains important project context across sessions without re-reading files, and token usage on routine tasks drops measurably.
**Not goals:**
- Replace the user's existing auto-memory system. The skill *integrates* with it, adding a pointer entry.
- Ingest chat history or session transcripts into Obsidian (that's a separate tool).
- Work on remote hosts (ClowH1 prod). Local macOS only for v1.
- Ship a published package (no pypi/brew/homebrew-tap). Skill lives in `~/.claude/skills/`.
## 2. Success criteria
1. On a clean project, running the skill once produces: working `codegraph`+`gitnexus` MCP servers visible to Claude Code, `.codegraph/` index, optional `.gitnexus/` index, updated `CLAUDE.md` with a `## Knowledge Graph` section, an Obsidian `{repo_name}/_index.md` entry, and a pointer line in auto-memory `MEMORY.md`.
2. Re-running on the same project is safe: by default only runs health check and reports; destructive actions are gated behind explicit user confirmation (choice "c" from design).
3. On a project without some prerequisites (e.g. no Obsidian vault), the skill degrades gracefully: skips that layer with a warning, still completes everything else.
4. After setup, Claude answers 3 smoke-test architectural questions on `arb-scanner` using the graph MCP tools instead of reading files. Token usage for those 3 questions is lower than pre-skill baseline (rough measurement, not a hard SLA).
## 3. Architecture
### 3.1 Four memory layers
| Layer | What it holds | Where | Update cadence | Skill's role |
|---|---|---|---|---|
| **Code graph** | AST-derived deps, callers/callees, call chains | `.codegraph/codegraph.db` + `.gitnexus/` | On-demand refresh | Install tools + run init |
| **Project CLAUDE.md** | Conventions, commands, gotchas | `./CLAUDE.md` | Rarely, by human | Add `## Knowledge Graph` section (idempotent) |
| **Obsidian vault** | Decisions, session logs, research, status | `{VaultRoot}/{repo_name}/` | Frequently, by human | Create `_index.md` pointer file; optional `sessions/`+`knowledge/` subfolders |
| **auto-memory** | Persistent facts between Claude sessions | `~/.claude/projects/{slug}/memory/` | Automatically by Claude | Append one pointer line to `MEMORY.md` |
The skill builds **infrastructure**. Content is the user's responsibility, except a minimal seed `_index.md` in Obsidian.
### 3.2 Skill structure
```
~/.claude/skills/kg-setup/
├── SKILL.md # Orchestrator (~150 lines)
├── scripts/
│ ├── check_prereqs.sh # Detect installed tools, emit JSON
│ ├── install_tools.sh # npm install both tools (idempotent)
│ ├── detect_project.py # git remote, LOC, language, vault path
│ ├── register_mcp.sh # claude mcp add for both servers
│ ├── init_codegraph.sh # codegraph init -i in cwd
│ ├── init_gitnexus.sh # gitnexus analyze . in cwd
│ ├── merge_claude_md.py # idempotent section merge
│ ├── build_obsidian_index.py # generate _index.md content
│ ├── update_memory_index.py # append pointer to MEMORY.md
│ ├── health_check.sh # MCP ping + test query to each graph
│ └── state.py # read/write .kg-setup-state.json
├── templates/
│ ├── claude_md_section.md # "## Knowledge Graph" template
│ ├── obsidian_index_minimal.md # Default _index.md
│ └── obsidian_index_rich.md # --rich / auto-rich variant
├── tests/
│ ├── test_merge_claude_md.py
│ ├── test_detect_project.py
│ ├── test_state.py
│ └── integration_test.sh
└── README.md # Human-facing docs for the skill itself
```
## 4. Components
### 4.1 `SKILL.md`
Frontmatter:
```yaml
---
name: kg-setup
description: Bootstrap a 4-layer project memory system (CodeGraph + GitNexus + CLAUDE.md + Obsidian + auto-memory). Use when user asks to "настрой граф знаний", "подключи базу знаний", "запомни проект", "setup knowledge graph", "bootstrap project memory", "initialize code graph", "пусть ты помнишь детали проекта".
---
```
Body — 5 phases in explicit markdown:
1. **DETECT** — invoke `check_prereqs.sh` + `detect_project.py`, read `./CLAUDE.md` and `./.kg-setup-state.json`. Aggregate into an in-memory `state` blob.
2. **PLAN** — Claude reads state, constructs action list. If state file says "healthy" and no `--refresh` flag, skip directly to VERIFY.
3. **EXECUTE** — run actions in order: install tools → register MCP → init graphs → merge CLAUDE.md → Obsidian → auto-memory. Each step is atomic; a failure after step N does not invalidate steps 1..N-1.
4. **VERIFY**`health_check.sh`. Writes summary to state file.
5. **REPORT** — Claude prints bulleted summary to user: what succeeded, what was skipped, what failed, suggested next steps.
Rerun behavior (decision point — user chose "c", ask):
- If state file exists and shows previous successful setup → Claude asks: "Проект уже настроен. `(r)` health-check only / `(f)` full refresh / `(s)` skip Obsidian only / `(q)` cancel?"
- Default to `(r)` if no answer after prompt.
### 4.2 `check_prereqs.sh`
Output schema (JSON to stdout):
```json
{
"schema_version": 1,
"env": {
"node": "v20.11.0",
"node_major": 20,
"python": "3.12.1",
"git": "2.43.0",
"npm": "10.2.4"
},
"tools": {
"gitnexus_cli": {"installed": false, "version": null, "path": null},
"codegraph_cli": {"installed": true, "version": "0.x.y", "path": "/opt/homebrew/bin/codegraph"},
"gitnexus_mcp_registered": false,
"codegraph_mcp_registered": true
},
"obsidian": {
"mcp_available": true,
"vault_path_hint": null
},
"errors": [],
"warnings": []
}
```
Node 18+ is the hard prerequisite (CodeGraph requirement). If `node_major < 18` → blocking error.
### 4.3 `install_tools.sh`
Concrete commands (from README reality-check):
```bash
npm install -g gitnexus
npm install -g @colbymchenry/codegraph
```
Idempotent: if `which codegraph` already exits 0, skip; same for gitnexus. Failures → retry once with `--force` flag, then bubble up.
### 4.4 `register_mcp.sh`
Uses Claude Code CLI:
```bash
claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp
```
Check via `claude mcp list` before adding. If already registered, skip.
### 4.5 `detect_project.py`
Detects:
- `git_remote_name`: `basename(git remote get-url origin, .git)`. If no remote → fallback to `basename(cwd)`, emit warning.
- `primary_lang`: heuristic based on file extensions + common markers (`requirements.txt`/`pyproject.toml` → python, `package.json` → js/ts, etc.)
- `loc`: sum of non-blank, non-comment lines across source files, capped at 100k for speed
- `vault_path`: determined in this priority order: (1) env var `KG_SETUP_VAULT_PATH` if set, (2) Obsidian MCP `get_vault_stats` response, (3) null. Never attempts filesystem scan for vaults. If null, Obsidian layer is skipped gracefully in phase 3.
### 4.6 `init_codegraph.sh` and `init_gitnexus.sh`
```bash
# codegraph
codegraph init -i # interactive false via env var or preset
# gitnexus
gitnexus analyze .
```
Both check for existing `.codegraph/` / `.gitnexus/` first. If present and refresh-mode is not active, skip with note. Refresh-mode is triggered in two ways: (a) user's natural-language intent contains phrases like "обнови граф", "re-index", "refresh graph" — Claude interprets and passes `--refresh` env var to the scripts; (b) user selects `(f) full refresh` in the rerun prompt (see SKILL.md rerun behavior in §4.1).
### 4.7 `merge_claude_md.py`
Rules:
1. If `./CLAUDE.md` missing → create with our section only + note that user should add their own conventions above.
2. If present but no section with heading `## Knowledge Graph` → append section to end.
3. If section exists and contains marker `<!-- generated:kg-setup-v1 -->` → update the generated-content lines (paths, timestamps), preserve anything below `<!-- user-content -->` marker unchanged.
4. If section exists without our marker → do not touch; emit warning "CLAUDE.md has a `## Knowledge Graph` section from another source, skipping merge."
Template stored in `templates/claude_md_section.md`:
```markdown
## Knowledge Graph
<!-- generated:kg-setup-v1 -->
Local graph indices:
- CodeGraph: `.codegraph/codegraph.db` (query via MCP server `codegraph`)
- GitNexus: `.gitnexus/` (query via MCP server `gitnexus`)
Obsidian notes: `{VaultRoot}/{repo_name}/_index.md`
auto-memory: `~/.claude/projects/{slug}/memory/MEMORY.md`
Refresh indices after major code changes:
- `codegraph init -i --refresh`
- `gitnexus analyze . --force`
<!-- /generated -->
<!-- user-content -->
<!-- anything below here is preserved between kg-setup runs -->
```
Pre-write safety: run `git status --porcelain -- CLAUDE.md`. If CLAUDE.md has uncommitted changes → refuse to write, ask user to commit or stash first.
### 4.8 `build_obsidian_index.py`
Generates `_index.md` content. Claude then writes it via Obsidian MCP `write_note`.
Mode selection:
- **rich**: explicit `--rich` flag OR `detect_project.loc > 5000` OR existence of `./tests/` and `./docs/` in project
- **minimal**: otherwise
Minimal template:
```markdown
---
tags: [project, kg-index]
---
# {repo_name}
**Repo:** `{project_path}`
**Languages:** {primary_lang}
**LOC:** ~{loc}
**Setup date:** {today}
## Quick links
- [[../arb-scanner/_index]] ← other projects in vault
- Code graph local index: `{project_path}/.codegraph/`
- CLAUDE.md: `{project_path}/CLAUDE.md`
## What lives here
(Add notes on decisions, research, session logs below or in nested folders.)
```
Rich template adds: `sessions/` placeholder, `knowledge/decisions/` placeholder, `knowledge/patterns/` placeholder — creating empty `.gitkeep`-style folder notes (`_folder.md` with a stub header).
### 4.9 `update_memory_index.py`
Appends one line to `~/.claude/projects/{slug}/memory/MEMORY.md`. `{slug}` is the project's canonical Claude Code projects-directory name: project path with `/` replaced by `-` (e.g. `/Users/pavelmalkin/Documents/Scaner``-Users-pavelmalkin-Documents-Scaner`). If the memory directory does not exist yet, skip this step — the user's auto-memory harness will create it on first interaction, and the skill can be re-run later to add the pointer.
```
- [Knowledge graph bootstrap for {repo_name}](project_kg_{repo_name}.md) — set up 2026-04-22, graph in .codegraph, vault: {repo_name}/
```
And writes the pointed-to file `project_kg_{repo_name}.md` with frontmatter:
```markdown
---
name: Knowledge graph for {repo_name}
description: Pointer to graph indices, Obsidian vault folder, and CLAUDE.md section for {repo_name}
type: project
---
Project `{repo_name}` ({project_path}) has kg-setup applied on 2026-04-22.
- CodeGraph MCP: server name `codegraph`, query with `mcp__codegraph__*` tools
- GitNexus MCP: server name `gitnexus`
- Obsidian vault folder: `{repo_name}/`
- State file: `{project_path}/.kg-setup-state.json`
- Reindex: `codegraph init -i --refresh` (or `gitnexus analyze . --force`)
```
Idempotent: checks for existing pointer line before appending.
### 4.10 `health_check.sh`
Three checks:
1. MCP servers visible: `claude mcp list | grep -E 'codegraph|gitnexus'` — each must return a line.
2. CodeGraph query: test via `mcp__codegraph__codegraph_status` tool through Claude (not direct — skill instructs Claude to run this as a test).
3. GitNexus query: similar smoke test.
Output: JSON summary. Claude converts to user-readable checklist in REPORT phase.
### 4.11 `state.py`
`.kg-setup-state.json` schema (at project root):
```json
{
"schema_version": 1,
"skill_version": "0.1.0",
"last_run": "2026-04-22T13:00:00Z",
"last_run_status": "healthy|degraded|incomplete",
"layers": {
"code_graph": {"configured": true, "tool": "codegraph", "index_path": ".codegraph/"},
"gitnexus": {"configured": true, "index_path": ".gitnexus/"},
"claude_md": {"configured": true, "section_marker": "kg-setup-v1"},
"obsidian": {"configured": true, "vault_folder": "arb-scanner/", "mode": "minimal"},
"auto_memory": {"configured": true, "pointer_file": "project_kg_arb-scanner.md"}
},
"warnings": [],
"errors": []
}
```
On successful first run, append `.kg-setup-state.json` to `.gitignore` (check it's not already there).
## 5. Data flow
See brainstorming session output (reproduced):
```
User trigger
→ Phase 1 DETECT (check_prereqs + detect_project + read state)
→ Phase 2 PLAN (Claude decides action list)
→ Phase 3 EXECUTE:
one-time: install_tools → register_mcp
per-project: init_codegraph → init_gitnexus → merge_claude_md
→ build_obsidian_index → update_memory_index
→ write state.json after each step
→ Phase 4 VERIFY (health_check)
→ Phase 5 REPORT (Claude writes bulleted summary)
```
## 6. Error handling
| Category | Examples | Behavior |
|---|---|---|
| Blocking | No node/python/git; node < 18 | Stop. Show install command. No disk writes. |
| Recoverable | npm network flake | Retry once. Then escalate to blocking. |
| Degraded | Obsidian vault not found; codegraph refuses a language | Skip layer. Write to `state.warnings`. Continue. |
| User-conflict | CLAUDE.md has `## Knowledge Graph` without our marker; CLAUDE.md has uncommitted changes | Stop that step. Prompt user. |
Atomic writes: tmp-file + rename for every file written. Never leave half-written artifacts.
Git-aware: refuse to write `CLAUDE.md` if user has uncommitted changes in it.
Errors/warnings always persisted to `.kg-setup-state.json`.
User-facing error format:
```
✗ install_tools (codegraph): npm exited with code 1
stderr: ...
→ Fix: `brew install node@20` and re-run skill
✓ install_tools (gitnexus): already at v1.2.3, skipped
```
## 7. Testing
### 7.1 Unit tests (pytest)
- `test_merge_claude_md.py`**highest priority**. Snapshot-based:
- no file → creates from template
- no section → appends
- section with our marker → updates generated lines only, preserves `<!-- user-content -->` block
- section without our marker → untouched, warning emitted
- 10 consecutive runs → identical output (idempotency)
- `test_detect_project.py` — tmp_path scenarios: with/without `.git`, various package manifests for language detection, LOC counter on synthetic files
- `test_state.py` — read/write roundtrip, schema_version migration stubs
### 7.2 Integration test
`tests/integration_test.sh`:
1. Create temp dir, `git init`, add `app.py` with a trivial function
2. Invoke skill phases directly (not through Claude) via the scripts
3. Assert: `.codegraph/` exists, `CLAUDE.md` exists and contains `kg-setup-v1` marker, state file `status=healthy`
4. Run again → no changes, exit 0
5. Cleanup
### 7.3 Manual smoke test (on arb-scanner itself)
Before/after comparison:
- Pre: ask Claude 3 architectural questions about arb-scanner, record token usage
- Run skill
- Post: ask same 3 questions, confirm Claude uses `mcp__codegraph__*` tools instead of reading files. Compare tokens.
### 7.4 Not tested automatically
- Actual `npm install` of `gitnexus` and `@colbymchenry/codegraph` (external network, unstable)
- Obsidian MCP writes against a real vault (tested once manually, then relied upon)
## 8. Install commands reference (verified from README 2026-04-22)
```bash
# CLI tools
npm install -g gitnexus
npm install -g @colbymchenry/codegraph
# MCP registration
claude mcp add gitnexus -- npx -y gitnexus@latest mcp
claude mcp add codegraph -- codegraph serve --mcp
# Per-project init
codegraph init -i
gitnexus analyze .
```
Requirements: Node.js 18+. No brew/pip install path documented in either README.
## 9. Open decisions recorded here
| # | Decision | User's choice | Rationale |
|---|---|---|---|
| 1 | Implementation approach | Approach 2 (skill + bundled scripts) | Deterministic, fast, low-token vs Approach 1; not overkill like Approach 3 |
| 2 | Install both GitNexus and CodeGraph | Both | User explicitly wanted both; different angles on the same graph |
| 3 | Scope | local-only | Remote (ClowH1) deferred; user will SSH-run scripts manually if needed |
| 4 | Rerun behavior | Ask user (choice c) | Most predictable, avoids silent destructive actions |
| 5 | Skill name | `kg-setup` | Shorter than `setup-knowledge-graph`; matches user's existing short names |
| 6 | Canonical project name | git remote basename | Derived from `git remote get-url origin`, fallback to cwd basename |
| 7 | Obsidian folder convention | `{VaultRoot}/{repo_name}/` | Matches existing `arb-scanner/` and `betting-dashboard/` |
| 8 | CLAUDE.md merge style | α (append section, preserve above) | Don't touch user's hand-written 102-line CLAUDE.md |
| 9 | Obsidian mode default | minimal, rich on flag or LOC > 5000 | Avoid clutter on small projects; betting-dashboard is rich → justified |
| 10 | Languages tested | Python (primary), JS (secondary) | arb-scanner is Python; user occasionally uses JS |
## 10. Out of scope for v1
- Remote/SSH setup for ClowH1 prod (separate tool later)
- Automated cron-based reindexing
- Chat history → Obsidian export pipeline
- Graph visualization UI beyond GitNexus's default web view
- Publishing the skill to a marketplace
- Support for languages beyond what CodeGraph/GitNexus already support
- Windows/Linux (macOS only)
## 11. Post-setup housekeeping (not the skill's job, but flagged in the report)
- User should rename `/Users/pavelmalkin/Documents/Scaner``.../arb-scanner` after end of current Claude Code session (to align with git remote name, Obsidian folder, and kg-setup canonical)
- User can populate `arb-scanner/_index.md` with richer content over time
- Ownership of the `## Knowledge Graph` section in CLAUDE.md: generated block is replaceable, everything below `<!-- user-content -->` is the user's