Files

Pavel Malkin f2c2ef54e4 initial: design + plan for kg-setup skill

Port the design spec and 17-task implementation plan from arb-scanner
(where the idea was born) to this dedicated repo. Paths in the plan
adjusted to treat this repo root as the skill root (no skills/kg-setup/
subdirectory). Implementation follows via subagent-driven-development.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 03:46:38 +03:00

60 KiB

Raw Permalink Blame History

kg-setup Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a Claude Code skill kg-setup that bootstraps a 4-layer project memory system (CodeGraph + GitNexus + CLAUDE.md + Obsidian + auto-memory) in any local project on macOS.

Architecture: A markdown orchestrator (SKILL.md) delegates mechanical work to bundled bash + python scripts in scripts/. Each script has a single responsibility, emits JSON where state matters, is idempotent, and is tested in isolation. A top-level state file .kg-setup-state.json tracks per-project setup progress for safe reruns. Skill developed in its own repo at ~/Projects/kg-setup/, symlinked into ~/.claude/skills/ at the end so Claude Code picks it up.

Tech Stack:

Bash (macOS/zsh) for system-level scripts (install, MCP register, health check)
Python 3.12 for logic-heavy scripts (detect_project, merge_claude_md, state, build_obsidian_index, update_memory_index)
pytest for Python unit tests
bats-core (or plain bash + set -e + [[ ]] asserts) for shell integration tests — default to plain bash to avoid extra deps
External CLIs (not dependencies of the plan, but dependencies of the skill at runtime): gitnexus (npm), @colbymchenry/codegraph (npm), Node.js 18+, claude CLI

File Structure

Will be created at the repo root:

(repo root)/
├── SKILL.md
├── README.md
├── scripts/
│   ├── check_prereqs.sh
│   ├── install_tools.sh
│   ├── detect_project.py
│   ├── register_mcp.sh
│   ├── init_codegraph.sh
│   ├── init_gitnexus.sh
│   ├── merge_claude_md.py
│   ├── build_obsidian_index.py
│   ├── update_memory_index.py
│   ├── health_check.sh
│   └── state.py
├── templates/
│   ├── claude_md_section.md
│   ├── obsidian_index_minimal.md
│   └── obsidian_index_rich.md
└── tests/
    ├── conftest.py
    ├── test_state.py
    ├── test_detect_project.py
    ├── test_merge_claude_md.py
    ├── test_build_obsidian_index.py
    ├── test_update_memory_index.py
    ├── test_check_prereqs.sh
    └── integration_test.sh

Responsibility boundaries:

state.py — single source of truth for .kg-setup-state.json I/O; used by every script that reads/writes state
detect_project.py — read-only; pure inspection, emits JSON, no side effects
check_prereqs.sh — read-only inspection of system env
install_tools.sh, register_mcp.sh, init_*.sh — system-state mutators; must be idempotent
merge_claude_md.py, build_obsidian_index.py, update_memory_index.py — file writers; must be atomic (tmp + rename)
health_check.sh — read-only verification
SKILL.md — orchestration logic only; no business rules embedded

Phase A — Foundation: state + detection (Tasks 1-4)

Pure-Python, no external dependencies, fully testable. Lands first so every downstream script has a solid I/O layer.

Task 1: Repo scaffolding

Files:

Create: scripts/ (directory)
Create: templates/ (directory)
Create: tests/ (directory)
Create: README.md
Create: tests/conftest.py
Create: .gitignore (top-level — repo is fresh)
Step 1: Create directories and README

mkdir -p scripts templates tests

Write README.md:

# kg-setup

Claude Code skill that bootstraps a 4-layer project memory system:
CodeGraph + GitNexus MCP servers, project CLAUDE.md section, Obsidian vault
folder with `_index.md`, and an auto-memory pointer.

## Install

Symlink into `~/.claude/skills/`:
```bash
ln -s "$(pwd)" ~/.claude/skills/kg-setup

Activate

Say to Claude Code: "настрой граф знаний" or "setup knowledge graph".

Design

See docs/superpowers/specs/2026-04-22-kg-setup-design.md in the repo root.


- [ ] **Step 2: Add pytest config to conftest**

Write `tests/conftest.py`:
```python
import sys
from pathlib import Path

SCRIPTS_DIR = Path(__file__).parent.parent / "scripts"
sys.path.insert(0, str(SCRIPTS_DIR))

Step 3: Update .gitignore

Append to .gitignore:

# kg-setup dev
__pycache__/
**/__pycache__/
**/*.pyc
.pytest_cache/

Step 4: Verify pytest runs (empty)

cd ~/Projects/kg-setup && python3 -m pytest tests/ -v

Expected: no tests ran in ... (exit code 5 is fine — no tests yet).

Step 5: Commit

git add  .gitignore
git commit -m "kg-setup: scaffold skill directory structure"

Task 2: `state.py` — state file I/O

Files:

Create: scripts/state.py
Create: tests/test_state.py
Step 1: Write the failing tests

Write tests/test_state.py:

import json
from pathlib import Path

from state import State, STATE_FILENAME, SCHEMA_VERSION


def test_state_defaults(tmp_path):
    s = State(tmp_path)
    assert s.schema_version == SCHEMA_VERSION
    assert s.status == "unknown"
    assert s.layers == {}
    assert s.warnings == []
    assert s.errors == []


def test_state_write_creates_file(tmp_path):
    s = State(tmp_path)
    s.status = "healthy"
    s.layers["code_graph"] = {"configured": True, "tool": "codegraph"}
    s.save()
    f = tmp_path / STATE_FILENAME
    assert f.exists()
    data = json.loads(f.read_text())
    assert data["schema_version"] == SCHEMA_VERSION
    assert data["last_run_status"] == "healthy"
    assert data["layers"]["code_graph"]["tool"] == "codegraph"


def test_state_load_roundtrip(tmp_path):
    s1 = State(tmp_path)
    s1.status = "degraded"
    s1.warnings.append({"phase": "obsidian", "message": "vault missing"})
    s1.save()

    s2 = State(tmp_path)
    s2.load()
    assert s2.status == "degraded"
    assert s2.warnings[0]["message"] == "vault missing"


def test_state_load_missing_file_noop(tmp_path):
    s = State(tmp_path)
    s.load()
    assert s.status == "unknown"


def test_state_atomic_write(tmp_path, monkeypatch):
    s = State(tmp_path)
    s.status = "healthy"
    s.save()
    tmp_file = tmp_path / (STATE_FILENAME + ".tmp")
    assert not tmp_file.exists(), "tmp file must be renamed, not left behind"

Step 2: Run tests to verify they fail

cd ~/Projects/kg-setup && python3 -m pytest tests/test_state.py -v

Expected: FAIL with ModuleNotFoundError: No module named 'state'.

Step 3: Implement state.py

Write scripts/state.py:

"""Read/write .kg-setup-state.json at project root."""
from __future__ import annotations

import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any

SCHEMA_VERSION = 1
SKILL_VERSION = "0.1.0"
STATE_FILENAME = ".kg-setup-state.json"


@dataclass
class State:
    project_path: Path
    schema_version: int = SCHEMA_VERSION
    skill_version: str = SKILL_VERSION
    last_run: str = ""
    status: str = "unknown"
    layers: dict[str, Any] = field(default_factory=dict)
    warnings: list[dict] = field(default_factory=list)
    errors: list[dict] = field(default_factory=list)

    @property
    def path(self) -> Path:
        return self.project_path / STATE_FILENAME

    def load(self) -> None:
        if not self.path.exists():
            return
        data = json.loads(self.path.read_text())
        self.schema_version = data.get("schema_version", SCHEMA_VERSION)
        self.skill_version = data.get("skill_version", SKILL_VERSION)
        self.last_run = data.get("last_run", "")
        self.status = data.get("last_run_status", "unknown")
        self.layers = data.get("layers", {})
        self.warnings = data.get("warnings", [])
        self.errors = data.get("errors", [])

    def save(self) -> None:
        self.last_run = datetime.now(timezone.utc).isoformat(timespec="seconds")
        payload = {
            "schema_version": self.schema_version,
            "skill_version": self.skill_version,
            "last_run": self.last_run,
            "last_run_status": self.status,
            "layers": self.layers,
            "warnings": self.warnings,
            "errors": self.errors,
        }
        tmp = self.path.with_suffix(self.path.suffix + ".tmp")
        tmp.write_text(json.dumps(payload, indent=2, ensure_ascii=False) + "\n")
        os.replace(tmp, self.path)

Step 4: Run tests to verify PASS

cd ~/Projects/kg-setup && python3 -m pytest tests/test_state.py -v

Expected: 5 passed.

Step 5: Commit

git add scripts/state.py tests/test_state.py
git commit -m "kg-setup: add state.py with tests"

Task 3: `detect_project.py` — read-only project inspection

Files:

Create: scripts/detect_project.py
Create: tests/test_detect_project.py
Step 1: Write the failing tests

Write tests/test_detect_project.py:

import json
import subprocess
from pathlib import Path

from detect_project import (
    detect_git_remote_name,
    detect_primary_language,
    count_loc,
    build_report,
)


def _git_init(path: Path, remote_url: str | None = None) -> None:
    subprocess.run(["git", "init", "-q", "-b", "main"], cwd=path, check=True)
    if remote_url:
        subprocess.run(
            ["git", "remote", "add", "origin", remote_url], cwd=path, check=True
        )


def test_detect_git_remote_name_with_origin(tmp_path):
    _git_init(tmp_path, "https://example.com/team/my-repo.git")
    assert detect_git_remote_name(tmp_path) == "my-repo"


def test_detect_git_remote_name_no_dotgit(tmp_path):
    _git_init(tmp_path, "git@example.com:team/other")
    assert detect_git_remote_name(tmp_path) == "other"


def test_detect_git_remote_name_no_remote(tmp_path):
    _git_init(tmp_path)
    assert detect_git_remote_name(tmp_path) is None


def test_detect_git_remote_name_no_git(tmp_path):
    assert detect_git_remote_name(tmp_path) is None


def test_detect_primary_language_python(tmp_path):
    (tmp_path / "requirements.txt").write_text("flask\n")
    (tmp_path / "app.py").write_text("print(1)\n")
    assert detect_primary_language(tmp_path) == "python"


def test_detect_primary_language_js(tmp_path):
    (tmp_path / "package.json").write_text('{"name": "x"}\n')
    (tmp_path / "index.js").write_text("console.log(1)\n")
    assert detect_primary_language(tmp_path) == "javascript"


def test_detect_primary_language_unknown(tmp_path):
    (tmp_path / "README.md").write_text("# hi\n")
    assert detect_primary_language(tmp_path) == "unknown"


def test_count_loc(tmp_path):
    (tmp_path / "a.py").write_text("x = 1\n\n# comment\ny = 2\n")
    (tmp_path / "b.py").write_text("z = 3\n")
    n = count_loc(tmp_path, [".py"])
    # non-blank, non-comment-only lines: x=1, y=2, z=3 → 3
    assert n == 3


def test_build_report_full(tmp_path):
    _git_init(tmp_path, "git@github.com:foo/bar.git")
    (tmp_path / "main.py").write_text("print(1)\n")
    (tmp_path / "requirements.txt").write_text("")

    report = build_report(tmp_path)
    data = json.loads(report)
    assert data["path"] == str(tmp_path)
    assert data["git_remote_name"] == "bar"
    assert data["primary_lang"] == "python"
    assert data["loc"] >= 1
    assert data["has_claude_md"] is False

Step 2: Run tests to verify they fail

cd ~/Projects/kg-setup && python3 -m pytest tests/test_detect_project.py -v

Expected: FAIL with ModuleNotFoundError: No module named 'detect_project'.

Step 3: Implement detect_project.py

Write scripts/detect_project.py:

"""Read-only project inspection. Emits JSON to stdout when run as CLI."""
from __future__ import annotations

import json
import subprocess
import sys
from pathlib import Path

SOURCE_EXTENSIONS = {
    "python": [".py"],
    "javascript": [".js", ".jsx", ".mjs"],
    "typescript": [".ts", ".tsx"],
    "go": [".go"],
    "rust": [".rs"],
    "java": [".java"],
}

LANGUAGE_MARKERS = {
    "python": ["requirements.txt", "pyproject.toml", "setup.py", "Pipfile"],
    "javascript": ["package.json"],
    "typescript": ["tsconfig.json"],
    "go": ["go.mod"],
    "rust": ["Cargo.toml"],
    "java": ["pom.xml", "build.gradle"],
}


def detect_git_remote_name(path: Path) -> str | None:
    try:
        result = subprocess.run(
            ["git", "remote", "get-url", "origin"],
            cwd=path,
            capture_output=True,
            text=True,
            timeout=5,
        )
        if result.returncode != 0:
            return None
        url = result.stdout.strip()
        if not url:
            return None
        name = url.rsplit("/", 1)[-1]
        if name.endswith(".git"):
            name = name[:-4]
        return name or None
    except (FileNotFoundError, subprocess.TimeoutExpired):
        return None


def detect_primary_language(path: Path) -> str:
    for lang, markers in LANGUAGE_MARKERS.items():
        if any((path / m).exists() for m in markers):
            return lang
    return "unknown"


def count_loc(path: Path, extensions: list[str], cap: int = 100_000) -> int:
    total = 0
    for ext in extensions:
        for f in path.rglob(f"*{ext}"):
            if any(part.startswith(".") for part in f.relative_to(path).parts):
                continue
            try:
                for line in f.read_text(errors="ignore").splitlines():
                    s = line.strip()
                    if s and not s.startswith("#") and not s.startswith("//"):
                        total += 1
                        if total >= cap:
                            return total
            except OSError:
                continue
    return total


def build_report(path: Path) -> str:
    lang = detect_primary_language(path)
    exts = SOURCE_EXTENSIONS.get(lang, [])
    report = {
        "path": str(path),
        "git_remote_name": detect_git_remote_name(path),
        "primary_lang": lang,
        "loc": count_loc(path, exts) if exts else 0,
        "has_claude_md": (path / "CLAUDE.md").exists(),
        "has_codegraph_dir": (path / ".codegraph").is_dir(),
        "has_gitnexus_dir": (path / ".gitnexus").is_dir(),
        "has_state_file": (path / ".kg-setup-state.json").exists(),
    }
    return json.dumps(report, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    target = Path(sys.argv[1]) if len(sys.argv) > 1 else Path.cwd()
    print(build_report(target))

Step 4: Run tests to verify PASS

cd ~/Projects/kg-setup && python3 -m pytest tests/test_detect_project.py -v

Expected: 9 passed.

Step 5: Smoke test CLI mode on arb-scanner itself

python3 scripts/detect_project.py .

Expected: JSON with git_remote_name: "arb-scanner", primary_lang: "python", loc > 0, has_claude_md: true.

Step 6: Commit

git add scripts/detect_project.py tests/test_detect_project.py
git commit -m "kg-setup: add detect_project.py with tests"

Task 4: `check_prereqs.sh` — env inspection

Files:

Create: scripts/check_prereqs.sh
Create: tests/test_check_prereqs.sh
Step 1: Write the shell test

Write tests/test_check_prereqs.sh:

#!/usr/bin/env bash
set -euo pipefail

SCRIPT="$(dirname "$0")/../scripts/check_prereqs.sh"

output=$("$SCRIPT")

# Must be valid JSON
echo "$output" | python3 -c "import json, sys; json.loads(sys.stdin.read())" \
  || { echo "FAIL: output not valid JSON"; exit 1; }

# Must contain the required top-level keys
for key in schema_version env tools obsidian errors warnings; do
  echo "$output" | python3 -c "
import json, sys
data = json.loads(sys.stdin.read())
assert '$key' in data, 'missing key: $key'
" || { echo "FAIL: missing key $key"; exit 1; }
done

# Must report node presence or blocking error
has_node=$(echo "$output" | python3 -c "
import json, sys
data = json.loads(sys.stdin.read())
print(bool(data['env'].get('node')))
")
if [[ "$has_node" != "True" && "$has_node" != "False" ]]; then
  echo "FAIL: node field malformed"; exit 1
fi

echo "PASS: check_prereqs.sh"

Make executable:

chmod +x tests/test_check_prereqs.sh

Step 2: Run test to verify failure

tests/test_check_prereqs.sh

Expected: error that script file does not exist.

Step 3: Implement check_prereqs.sh

Write scripts/check_prereqs.sh:

#!/usr/bin/env bash
# check_prereqs.sh — inspect local env, emit JSON to stdout.
# Read-only. Never mutates state. Never writes files.
set -euo pipefail

json_str() {
  # escape string for JSON
  python3 -c "import json,sys; print(json.dumps(sys.stdin.read().strip()))"
}

get_version() {
  local cmd="$1" flag="${2:---version}"
  if command -v "$cmd" >/dev/null 2>&1; then
    local out
    out=$("$cmd" "$flag" 2>&1 | head -n1 || true)
    json_str <<<"$out"
  else
    echo "null"
  fi
}

node_major() {
  if command -v node >/dev/null 2>&1; then
    node -v 2>/dev/null | sed -E 's/^v([0-9]+).*/\1/'
  else
    echo "null"
  fi
}

tool_entry() {
  local name="$1" cli="$2"
  local installed=false version="null" path="null"
  if command -v "$cli" >/dev/null 2>&1; then
    installed=true
    local p v
    p=$(command -v "$cli")
    path=$(json_str <<<"$p")
    v=$("$cli" --version 2>/dev/null | head -n1 || true)
    if [[ -n "$v" ]]; then
      version=$(json_str <<<"$v")
    fi
  fi
  printf '    "%s": {"installed": %s, "version": %s, "path": %s}' \
    "$name" "$installed" "$version" "$path"
}

mcp_registered() {
  local server_name="$1"
  if command -v claude >/dev/null 2>&1; then
    if claude mcp list 2>/dev/null | grep -q "^${server_name}\b"; then
      echo "true"
    else
      echo "false"
    fi
  else
    echo "false"
  fi
}

NODE_VERSION=$(get_version node -v)
PYTHON_VERSION=$(get_version python3 --version)
GIT_VERSION=$(get_version git --version)
NPM_VERSION=$(get_version npm --version)
NODE_MAJOR=$(node_major)

GITNEXUS_ENTRY=$(tool_entry "gitnexus_cli" "gitnexus")
CODEGRAPH_ENTRY=$(tool_entry "codegraph_cli" "codegraph")
GITNEXUS_MCP=$(mcp_registered "gitnexus")
CODEGRAPH_MCP=$(mcp_registered "codegraph")

ERRORS="[]"
WARNINGS="[]"
if [[ -z "$NODE_MAJOR" || "$NODE_MAJOR" == "null" ]]; then
  ERRORS='[{"code":"no_node","message":"Node.js not found; install Node 18+ via `brew install node`"}]'
elif [[ "$NODE_MAJOR" -lt 18 ]]; then
  ERRORS="[{\"code\":\"node_too_old\",\"message\":\"Node $NODE_MAJOR < 18; upgrade via \`brew upgrade node\`\"}]"
fi

if [[ -n "${KG_SETUP_VAULT_PATH:-}" ]]; then
  VAULT_HINT="\"$KG_SETUP_VAULT_PATH\""
else
  VAULT_HINT="null"
fi

cat <<JSON
{
  "schema_version": 1,
  "env": {
    "node": ${NODE_VERSION},
    "node_major": ${NODE_MAJOR:-null},
    "python": ${PYTHON_VERSION},
    "git": ${GIT_VERSION},
    "npm": ${NPM_VERSION}
  },
  "tools": {
${GITNEXUS_ENTRY},
${CODEGRAPH_ENTRY},
    "gitnexus_mcp_registered": ${GITNEXUS_MCP},
    "codegraph_mcp_registered": ${CODEGRAPH_MCP}
  },
  "obsidian": {
    "mcp_available": null,
    "vault_path_hint": ${VAULT_HINT}
  },
  "errors": ${ERRORS},
  "warnings": ${WARNINGS}
}
JSON

Make executable:

chmod +x scripts/check_prereqs.sh

Step 4: Run test to verify PASS

tests/test_check_prereqs.sh

Expected: PASS: check_prereqs.sh.

Step 5: Smoke-run on this system

scripts/check_prereqs.sh | python3 -m json.tool

Expected: valid JSON with actual versions of node/python/git and false for gitnexus/codegraph if you haven't installed them yet.

Step 6: Commit

git add scripts/check_prereqs.sh tests/test_check_prereqs.sh
git commit -m "kg-setup: add check_prereqs.sh with smoke test"

Phase B — System plumbing: install + MCP register + health (Tasks 5-7)

These mutate system state. Tested via actual smoke runs + idempotency checks, not unit tests.

Task 5: `install_tools.sh` — install GitNexus + CodeGraph

Files:

Create: scripts/install_tools.sh
Step 1: Implement install_tools.sh

Write scripts/install_tools.sh:

#!/usr/bin/env bash
# install_tools.sh — install GitNexus + CodeGraph globally via npm.
# Idempotent: skips if already present. Retries once on npm failure.
set -euo pipefail

GITNEXUS_PKG="gitnexus"
CODEGRAPH_PKG="@colbymchenry/codegraph"

require_npm() {
  command -v npm >/dev/null 2>&1 || {
    echo "ERROR: npm not found. Install Node 18+ first (brew install node)" >&2
    exit 2
  }
}

install_if_missing() {
  local cli_name="$1" pkg="$2"
  if command -v "$cli_name" >/dev/null 2>&1; then
    echo "[skip] $cli_name already installed: $(command -v "$cli_name")"
    return 0
  fi
  echo "[install] npm install -g $pkg"
  if ! npm install -g "$pkg"; then
    echo "[retry] npm install -g $pkg (attempt 2)"
    npm install -g "$pkg"
  fi
  command -v "$cli_name" >/dev/null 2>&1 || {
    echo "ERROR: $cli_name still not on PATH after install" >&2
    return 1
  }
  echo "[done] $cli_name installed"
}

require_npm
install_if_missing gitnexus "$GITNEXUS_PKG"
install_if_missing codegraph "$CODEGRAPH_PKG"

Make executable:

chmod +x scripts/install_tools.sh

Step 2: Dry-run verify script syntax

bash -n scripts/install_tools.sh && echo "syntax ok"

Expected: syntax ok.

Step 3: Manual execution (user approval gate)

Before running: confirm with user that installing gitnexus and @colbymchenry/codegraph globally is OK. Then:

scripts/install_tools.sh

Expected: [install] or [skip] lines for each tool, final state has both on PATH.

Verify:

which gitnexus && which codegraph

Step 4: Idempotency check

scripts/install_tools.sh

Expected: both report [skip].

Step 5: Commit

git add scripts/install_tools.sh
git commit -m "kg-setup: add install_tools.sh for gitnexus + codegraph"

Task 6: `register_mcp.sh` — add MCP servers to Claude Code

Files:

Create: scripts/register_mcp.sh
Step 1: Implement register_mcp.sh

Write scripts/register_mcp.sh:

#!/usr/bin/env bash
# register_mcp.sh — register gitnexus + codegraph MCP servers with Claude Code.
# Idempotent: uses `claude mcp list` to detect existing registration.
set -euo pipefail

require_claude() {
  command -v claude >/dev/null 2>&1 || {
    echo "ERROR: claude CLI not found. Install Claude Code first." >&2
    exit 2
  }
}

is_registered() {
  local name="$1"
  claude mcp list 2>/dev/null | grep -Eq "^${name}[[:space:]]"
}

register_if_missing() {
  local name="$1"; shift
  if is_registered "$name"; then
    echo "[skip] mcp server '$name' already registered"
    return 0
  fi
  echo "[register] claude mcp add $name -- $*"
  claude mcp add "$name" -- "$@"
  is_registered "$name" || {
    echo "ERROR: registration of '$name' did not take effect" >&2
    return 1
  }
  echo "[done] '$name' registered"
}

require_claude
register_if_missing gitnexus npx -y gitnexus@latest mcp
register_if_missing codegraph codegraph serve --mcp

Make executable:

chmod +x scripts/register_mcp.sh

Step 2: Syntax check

bash -n scripts/register_mcp.sh && echo "syntax ok"

Expected: syntax ok.

Step 3: Execute and verify (user approval gate)

scripts/register_mcp.sh
claude mcp list

Expected: list contains gitnexus and codegraph rows.

Step 4: Idempotency check

scripts/register_mcp.sh

Expected: both [skip].

Step 5: Commit

git add scripts/register_mcp.sh
git commit -m "kg-setup: add register_mcp.sh for gitnexus + codegraph servers"

Task 7: `health_check.sh` — post-setup verification

Files:

Create: scripts/health_check.sh
Step 1: Implement health_check.sh

Write scripts/health_check.sh:

#!/usr/bin/env bash
# health_check.sh — verify kg-setup artifacts in current project.
# Emits JSON summary to stdout. Exit code 0 = healthy, 1 = degraded.
set -uo pipefail

PROJECT="${1:-$PWD}"
cd "$PROJECT"

check_mcp_listed() {
  local name="$1"
  if command -v claude >/dev/null 2>&1 \
     && claude mcp list 2>/dev/null | grep -Eq "^${name}[[:space:]]"; then
    echo "true"
  else
    echo "false"
  fi
}

check_codegraph_dir() {
  [[ -d ".codegraph" ]] && echo "true" || echo "false"
}

check_gitnexus_dir() {
  [[ -d ".gitnexus" ]] && echo "true" || echo "false"
}

check_claude_md_section() {
  [[ -f "CLAUDE.md" ]] && grep -q "<!-- generated:kg-setup-v1 -->" CLAUDE.md \
    && echo "true" || echo "false"
}

check_state_file() {
  [[ -f ".kg-setup-state.json" ]] && echo "true" || echo "false"
}

RESULTS=$(cat <<JSON
{
  "project": "$PROJECT",
  "checks": {
    "codegraph_mcp_listed": $(check_mcp_listed codegraph),
    "gitnexus_mcp_listed": $(check_mcp_listed gitnexus),
    "codegraph_index": $(check_codegraph_dir),
    "gitnexus_index": $(check_gitnexus_dir),
    "claude_md_section": $(check_claude_md_section),
    "state_file": $(check_state_file)
  }
}
JSON
)

echo "$RESULTS"

# Exit code: 0 if all core (codegraph_mcp + codegraph_index + state_file) pass
core_pass=$(echo "$RESULTS" | python3 -c "
import json, sys
d = json.load(sys.stdin)
c = d['checks']
print(c['codegraph_mcp_listed'] and c['codegraph_index'] and c['state_file'])
")
if [[ "$core_pass" == "True" ]]; then
  exit 0
else
  exit 1
fi

Make executable:

chmod +x scripts/health_check.sh

Step 2: Syntax check

bash -n scripts/health_check.sh && echo "syntax ok"

Expected: syntax ok.

Step 3: Smoke run in a clean directory

(cd /tmp && mkdir -p kg-test && cd kg-test && /Users/pavelmalkin/Projects/kg-setup/scripts/health_check.sh) || echo "exit=1 (expected, no setup)"

Expected: JSON with all false, exit 1.

Step 4: Commit

git add scripts/health_check.sh
git commit -m "kg-setup: add health_check.sh for post-setup verification"

Phase C — Per-project logic (Tasks 8-12)

Task 8: `init_codegraph.sh` and `init_gitnexus.sh`

Files:

Create: scripts/init_codegraph.sh
Create: scripts/init_gitnexus.sh
Step 1: Implement init_codegraph.sh

Write scripts/init_codegraph.sh:

#!/usr/bin/env bash
# init_codegraph.sh — run `codegraph init -i` in current project.
# Respects KG_SETUP_REFRESH env var to force re-index.
set -euo pipefail

if ! command -v codegraph >/dev/null 2>&1; then
  echo "ERROR: codegraph not on PATH. Run install_tools.sh first." >&2
  exit 2
fi

REFRESH="${KG_SETUP_REFRESH:-0}"
if [[ -d ".codegraph" && "$REFRESH" != "1" ]]; then
  echo "[skip] .codegraph/ already exists (set KG_SETUP_REFRESH=1 to force reindex)"
  exit 0
fi

if [[ -d ".codegraph" && "$REFRESH" == "1" ]]; then
  echo "[refresh] removing .codegraph/ before reindex"
  rm -rf .codegraph
fi

echo "[init] codegraph init -i"
codegraph init -i
[[ -d ".codegraph" ]] || {
  echo "ERROR: codegraph init did not create .codegraph/" >&2
  exit 1
}
echo "[done] .codegraph/ initialized"

Step 2: Implement init_gitnexus.sh

Write scripts/init_gitnexus.sh:

#!/usr/bin/env bash
# init_gitnexus.sh — run `gitnexus analyze .` in current project.
set -euo pipefail

if ! command -v gitnexus >/dev/null 2>&1; then
  echo "ERROR: gitnexus not on PATH. Run install_tools.sh first." >&2
  exit 2
fi

REFRESH="${KG_SETUP_REFRESH:-0}"
if [[ -d ".gitnexus" && "$REFRESH" != "1" ]]; then
  echo "[skip] .gitnexus/ already exists (set KG_SETUP_REFRESH=1 to force reindex)"
  exit 0
fi

if [[ "$REFRESH" == "1" ]]; then
  FORCE_FLAG="--force"
else
  FORCE_FLAG=""
fi

echo "[init] gitnexus analyze . $FORCE_FLAG"
gitnexus analyze . $FORCE_FLAG
[[ -d ".gitnexus" ]] || {
  echo "ERROR: gitnexus analyze did not create .gitnexus/" >&2
  exit 1
}
echo "[done] .gitnexus/ initialized"

Make both executable:

chmod +x scripts/init_codegraph.sh
chmod +x scripts/init_gitnexus.sh

Step 3: Syntax checks

bash -n scripts/init_codegraph.sh
bash -n scripts/init_gitnexus.sh
echo "syntax ok"

Step 4: Smoke test — skip path

(cd /tmp && mkdir -p kg-test-init && cd kg-test-init && mkdir .codegraph && \
 /Users/pavelmalkin/Projects/kg-setup/scripts/init_codegraph.sh)

Expected: [skip] .codegraph/ already exists.

Step 5: Commit

git add scripts/init_codegraph.sh scripts/init_gitnexus.sh
git commit -m "kg-setup: add init_codegraph.sh + init_gitnexus.sh"

Task 9: `merge_claude_md.py` — critical, fully TDD

Files:

Create: templates/claude_md_section.md
Create: scripts/merge_claude_md.py
Create: tests/test_merge_claude_md.py
Step 1: Write the template

Write templates/claude_md_section.md:

## Knowledge Graph

<!-- generated:kg-setup-v1 -->
Local graph indices:
- CodeGraph: `.codegraph/codegraph.db` (query via MCP server `codegraph`)
- GitNexus: `.gitnexus/` (query via MCP server `gitnexus`)

Obsidian notes: `{vault_root}/{repo_name}/_index.md`
auto-memory: `~/.claude/projects/{slug}/memory/MEMORY.md`

Refresh indices after major code changes:
- `KG_SETUP_REFRESH=1 bash ~/.claude/scripts/init_codegraph.sh`
- `KG_SETUP_REFRESH=1 bash ~/.claude/scripts/init_gitnexus.sh`
<!-- /generated -->

<!-- user-content -->
<!-- Anything below is preserved between kg-setup runs. -->

Step 2: Write failing tests

Write tests/test_merge_claude_md.py:

from pathlib import Path

from merge_claude_md import merge, MARKER_START, MARKER_END, USER_CONTENT_MARKER


TEMPLATE_VARS = {
    "vault_root": "/Users/u/Obsidian/Vault",
    "repo_name": "myrepo",
    "slug": "-Users-u-myrepo",
}


def test_merge_creates_file_when_missing(tmp_path):
    f = tmp_path / "CLAUDE.md"
    result = merge(f, TEMPLATE_VARS)
    assert result.action == "created"
    text = f.read_text()
    assert "## Knowledge Graph" in text
    assert MARKER_START in text
    assert "/Users/u/Obsidian/Vault/myrepo/_index.md" in text


def test_merge_appends_to_existing_no_section(tmp_path):
    f = tmp_path / "CLAUDE.md"
    f.write_text("# My Project\n\nSome content.\n")
    result = merge(f, TEMPLATE_VARS)
    assert result.action == "appended"
    text = f.read_text()
    assert text.startswith("# My Project")
    assert "## Knowledge Graph" in text
    assert MARKER_START in text


def test_merge_updates_generated_block_preserves_user_content(tmp_path):
    f = tmp_path / "CLAUDE.md"
    initial = f"""# Proj

## Knowledge Graph

{MARKER_START}
old stuff
{MARKER_END}

{USER_CONTENT_MARKER}

My own notes here.
Keep me.
"""
    f.write_text(initial)

    result = merge(f, TEMPLATE_VARS)
    assert result.action == "updated"
    text = f.read_text()
    assert "old stuff" not in text
    assert "My own notes here." in text
    assert "Keep me." in text
    assert "/Users/u/Obsidian/Vault/myrepo/_index.md" in text


def test_merge_skips_section_without_our_marker(tmp_path):
    f = tmp_path / "CLAUDE.md"
    initial = "# Proj\n\n## Knowledge Graph\n\nCustom user section.\n"
    f.write_text(initial)

    result = merge(f, TEMPLATE_VARS)
    assert result.action == "skipped_foreign_section"
    assert result.warning
    assert f.read_text() == initial  # unchanged


def test_merge_idempotent(tmp_path):
    f = tmp_path / "CLAUDE.md"
    merge(f, TEMPLATE_VARS)
    first = f.read_text()
    for _ in range(10):
        merge(f, TEMPLATE_VARS)
    assert f.read_text() == first


def test_merge_atomic_write_no_tmp_leftover(tmp_path):
    f = tmp_path / "CLAUDE.md"
    merge(f, TEMPLATE_VARS)
    assert not (tmp_path / "CLAUDE.md.tmp").exists()

Step 3: Run tests — verify fail

cd ~/Projects/kg-setup && python3 -m pytest tests/test_merge_claude_md.py -v

Expected: FAIL ModuleNotFoundError.

Step 4: Implement merge_claude_md.py

Write scripts/merge_claude_md.py:

"""Idempotently merge the kg-setup 'Knowledge Graph' section into CLAUDE.md."""
from __future__ import annotations

import os
import re
import sys
from dataclasses import dataclass
from pathlib import Path

MARKER_START = "<!-- generated:kg-setup-v1 -->"
MARKER_END = "<!-- /generated -->"
USER_CONTENT_MARKER = "<!-- user-content -->"
SECTION_HEADING = "## Knowledge Graph"

TEMPLATE_PATH = Path(__file__).parent.parent / "templates" / "claude_md_section.md"


@dataclass
class MergeResult:
    action: str  # created | appended | updated | skipped_foreign_section
    warning: str | None = None


def _render_template(vars: dict) -> str:
    tpl = TEMPLATE_PATH.read_text()
    return tpl.format(**vars)


def _atomic_write(path: Path, content: str) -> None:
    tmp = path.with_suffix(path.suffix + ".tmp")
    tmp.write_text(content)
    os.replace(tmp, path)


def _has_our_section(text: str) -> bool:
    return MARKER_START in text and MARKER_END in text


def _has_foreign_section(text: str) -> bool:
    return SECTION_HEADING in text and MARKER_START not in text


def _extract_user_content(text: str) -> str:
    idx = text.find(USER_CONTENT_MARKER)
    if idx == -1:
        return ""
    return text[idx + len(USER_CONTENT_MARKER):]


def merge(claude_md: Path, template_vars: dict) -> MergeResult:
    rendered = _render_template(template_vars)

    if not claude_md.exists():
        _atomic_write(claude_md, rendered)
        return MergeResult(action="created")

    existing = claude_md.read_text()

    if _has_our_section(existing):
        user_tail = _extract_user_content(existing)
        # Rebuild: everything before our section_heading + fresh rendered +
        # optionally carry forward the user-content tail (without its marker; template has its own)
        before_heading = existing.split(SECTION_HEADING, 1)[0].rstrip() + "\n\n"
        # rendered already ends with the user-content marker; we append the extracted tail after it
        new_text = before_heading + rendered.rstrip() + user_tail
        # Normalize to end with one newline
        if not new_text.endswith("\n"):
            new_text += "\n"
        _atomic_write(claude_md, new_text)
        return MergeResult(action="updated")

    if _has_foreign_section(existing):
        return MergeResult(
            action="skipped_foreign_section",
            warning=(
                "CLAUDE.md already has a '## Knowledge Graph' section without "
                "the kg-setup marker. Not overwriting. Remove manually or add "
                f"the marker '{MARKER_START}' to allow kg-setup to manage it."
            ),
        )

    # Plain append
    sep = "" if existing.endswith("\n") else "\n"
    new_text = existing + sep + "\n" + rendered
    _atomic_write(claude_md, new_text)
    return MergeResult(action="appended")


if __name__ == "__main__":
    if len(sys.argv) < 4:
        print("usage: merge_claude_md.py <claude_md_path> <vault_root> <repo_name> <slug>",
              file=sys.stderr)
        sys.exit(2)
    result = merge(
        Path(sys.argv[1]),
        {"vault_root": sys.argv[2], "repo_name": sys.argv[3], "slug": sys.argv[4]},
    )
    print(f"action={result.action}")
    if result.warning:
        print(f"warning={result.warning}", file=sys.stderr)

Step 5: Run tests — verify pass

cd ~/Projects/kg-setup && python3 -m pytest tests/test_merge_claude_md.py -v

Expected: 6 passed.

Step 6: Commit

git add templates/claude_md_section.md \
        scripts/merge_claude_md.py \
        tests/test_merge_claude_md.py
git commit -m "kg-setup: add merge_claude_md.py (idempotent section merge) with tests"

Task 10: `build_obsidian_index.py` — generate `_index.md` content

Files:

Create: templates/obsidian_index_minimal.md
Create: templates/obsidian_index_rich.md
Create: scripts/build_obsidian_index.py
Create: tests/test_build_obsidian_index.py
Step 1: Write templates

Write templates/obsidian_index_minimal.md:

---
tags: [project, kg-index]
project: {repo_name}
setup_date: {today}
---

# {repo_name}

**Repo:** `{project_path}`
**Languages:** {primary_lang}
**LOC:** ~{loc}
**Setup date:** {today}

## Quick links

- Code graph local index: `{project_path}/.codegraph/`
- CLAUDE.md: `{project_path}/CLAUDE.md`
- State: `{project_path}/.kg-setup-state.json`

## What lives here

Add notes on decisions, research, session logs below or in nested folders.

Write templates/obsidian_index_rich.md:

---
tags: [project, kg-index, rich]
project: {repo_name}
setup_date: {today}
---

# {repo_name}

**Repo:** `{project_path}`
**Languages:** {primary_lang}
**LOC:** ~{loc}
**Setup date:** {today}

## Quick links

- Code graph local index: `{project_path}/.codegraph/`
- CLAUDE.md: `{project_path}/CLAUDE.md`
- State: `{project_path}/.kg-setup-state.json`

## Sections

- [[{repo_name}/sessions/_index|Sessions]] — chronological work logs
- [[{repo_name}/knowledge/decisions/_index|Decisions]] — recorded design calls
- [[{repo_name}/knowledge/patterns/_index|Patterns]] — reusable patterns and idioms

## What lives here

Top-level: high-signal notes you want always visible. Nested folders: dated
session logs, decisions, and patterns. Use backlinks liberally.

Step 2: Write failing tests

Write tests/test_build_obsidian_index.py:

from build_obsidian_index import render, choose_mode


VARS = {
    "repo_name": "foo",
    "project_path": "/tmp/foo",
    "primary_lang": "python",
    "loc": 1234,
    "today": "2026-04-22",
}


def test_choose_mode_minimal_small_project():
    assert choose_mode(loc=100, has_tests=False, has_docs=False, explicit=None) == "minimal"


def test_choose_mode_rich_large_project():
    assert choose_mode(loc=6000, has_tests=False, has_docs=False, explicit=None) == "rich"


def test_choose_mode_rich_by_structure():
    assert choose_mode(loc=100, has_tests=True, has_docs=True, explicit=None) == "rich"


def test_choose_mode_explicit_overrides():
    assert choose_mode(loc=100000, has_tests=True, has_docs=True, explicit="minimal") == "minimal"
    assert choose_mode(loc=1, has_tests=False, has_docs=False, explicit="rich") == "rich"


def test_render_minimal_contains_required_fields():
    text = render("minimal", VARS)
    assert "# foo" in text
    assert "/tmp/foo" in text
    assert "python" in text
    assert "1234" in text
    assert "2026-04-22" in text


def test_render_rich_contains_section_links():
    text = render("rich", VARS)
    assert "Sessions" in text
    assert "Decisions" in text
    assert "Patterns" in text


def test_render_unknown_mode_raises():
    try:
        render("weird", VARS)
    except ValueError:
        return
    raise AssertionError("expected ValueError")

Step 3: Run tests — verify fail

cd ~/Projects/kg-setup && python3 -m pytest tests/test_build_obsidian_index.py -v

Expected: FAIL ModuleNotFoundError.

Step 4: Implement build_obsidian_index.py

Write scripts/build_obsidian_index.py:

"""Generate Obsidian _index.md content for a project.

Does not write to vault directly — returns rendered text so the caller
(Claude, via Obsidian MCP write_note) performs the actual write.
"""
from __future__ import annotations

import json
import sys
from datetime import date
from pathlib import Path

TEMPLATES = Path(__file__).parent.parent / "templates"
LOC_RICH_THRESHOLD = 5000


def choose_mode(
    loc: int, has_tests: bool, has_docs: bool, explicit: str | None
) -> str:
    if explicit in ("minimal", "rich"):
        return explicit
    if loc > LOC_RICH_THRESHOLD:
        return "rich"
    if has_tests and has_docs:
        return "rich"
    return "minimal"


def render(mode: str, vars: dict) -> str:
    if mode not in ("minimal", "rich"):
        raise ValueError(f"unknown mode: {mode}")
    tpl = (TEMPLATES / f"obsidian_index_{mode}.md").read_text()
    v = dict(vars)
    v.setdefault("today", date.today().isoformat())
    return tpl.format(**v)


if __name__ == "__main__":
    # Input: JSON on stdin with keys matching template vars + optional "mode"
    # plus "has_tests" and "has_docs" for mode selection.
    inp = json.loads(sys.stdin.read())
    mode = choose_mode(
        loc=inp.get("loc", 0),
        has_tests=bool(inp.get("has_tests", False)),
        has_docs=bool(inp.get("has_docs", False)),
        explicit=inp.get("mode"),
    )
    print(render(mode, inp))

Step 5: Run tests — verify pass

cd ~/Projects/kg-setup && python3 -m pytest tests/test_build_obsidian_index.py -v

Expected: 7 passed.

Step 6: Commit

git add templates/obsidian_index_*.md \
        scripts/build_obsidian_index.py \
        tests/test_build_obsidian_index.py
git commit -m "kg-setup: add build_obsidian_index.py + templates with tests"

Task 11: `update_memory_index.py` — auto-memory pointer

Files:

Create: scripts/update_memory_index.py
Create: tests/test_update_memory_index.py
Step 1: Write failing tests

Write tests/test_update_memory_index.py:

from pathlib import Path

from update_memory_index import (
    path_to_slug,
    memory_dir_for,
    append_pointer,
    write_memory_file,
)


def test_path_to_slug():
    assert path_to_slug("/Users/u/Documents/foo") == "-Users-u-Documents-foo"
    assert path_to_slug("/a/b") == "-a-b"


def test_memory_dir_for(tmp_path):
    # simulate ~/.claude/projects/ layout
    claude_home = tmp_path / ".claude"
    project_path = "/Users/u/x"
    expected = claude_home / "projects" / "-Users-u-x" / "memory"
    assert memory_dir_for(project_path, claude_home=claude_home) == expected


def test_append_pointer_creates_memory_md(tmp_path):
    mem_dir = tmp_path / "memory"
    mem_dir.mkdir()
    (mem_dir / "MEMORY.md").write_text("- [existing](file.md) — hook\n")
    append_pointer(
        memory_dir=mem_dir,
        repo_name="myrepo",
        date_str="2026-04-22",
    )
    text = (mem_dir / "MEMORY.md").read_text()
    assert "myrepo" in text
    assert "2026-04-22" in text
    assert "- [existing]" in text  # previous line preserved


def test_append_pointer_idempotent(tmp_path):
    mem_dir = tmp_path / "memory"
    mem_dir.mkdir()
    (mem_dir / "MEMORY.md").write_text("")
    for _ in range(3):
        append_pointer(memory_dir=mem_dir, repo_name="r", date_str="2026-04-22")
    text = (mem_dir / "MEMORY.md").read_text()
    assert text.count("[Knowledge graph bootstrap for r]") == 1


def test_append_pointer_skips_when_no_memory_dir(tmp_path):
    # memory dir does not exist → skipped, no error
    append_pointer(memory_dir=tmp_path / "nope", repo_name="r", date_str="2026-04-22")
    assert not (tmp_path / "nope").exists()


def test_write_memory_file(tmp_path):
    mem_dir = tmp_path / "memory"
    mem_dir.mkdir()
    write_memory_file(
        memory_dir=mem_dir,
        repo_name="zz",
        project_path="/tmp/zz",
        date_str="2026-04-22",
    )
    f = mem_dir / "project_kg_zz.md"
    assert f.exists()
    text = f.read_text()
    assert "name: Knowledge graph for zz" in text
    assert "type: project" in text
    assert "/tmp/zz" in text

Step 2: Run tests — verify fail

cd ~/Projects/kg-setup && python3 -m pytest tests/test_update_memory_index.py -v

Expected: FAIL ModuleNotFoundError.

Step 3: Implement update_memory_index.py

Write scripts/update_memory_index.py:

"""Append kg-setup pointer to user's auto-memory index."""
from __future__ import annotations

import os
import sys
from pathlib import Path


def path_to_slug(project_path: str) -> str:
    return project_path.replace("/", "-")


def memory_dir_for(
    project_path: str, claude_home: Path | None = None
) -> Path:
    home = claude_home or Path.home() / ".claude"
    return home / "projects" / path_to_slug(project_path) / "memory"


def append_pointer(memory_dir: Path, repo_name: str, date_str: str) -> str:
    """Append a pointer line to MEMORY.md. Idempotent. Returns action taken."""
    if not memory_dir.exists():
        return "skipped_no_memory_dir"

    memory_md = memory_dir / "MEMORY.md"
    line = (
        f"- [Knowledge graph bootstrap for {repo_name}]"
        f"(project_kg_{repo_name}.md) — set up {date_str}, "
        f"graph in .codegraph, vault: {repo_name}/"
    )

    existing = memory_md.read_text() if memory_md.exists() else ""
    if f"[Knowledge graph bootstrap for {repo_name}]" in existing:
        return "already_present"

    sep = "" if (not existing or existing.endswith("\n")) else "\n"
    new_text = existing + sep + line + "\n"
    _atomic_write(memory_md, new_text)
    return "appended"


def write_memory_file(
    memory_dir: Path, repo_name: str, project_path: str, date_str: str
) -> str:
    if not memory_dir.exists():
        return "skipped_no_memory_dir"
    target = memory_dir / f"project_kg_{repo_name}.md"
    content = f"""---
name: Knowledge graph for {repo_name}
description: Pointer to graph indices, Obsidian vault folder, and CLAUDE.md section for {repo_name}
type: project
---

Project `{repo_name}` ({project_path}) has kg-setup applied on {date_str}.

- CodeGraph MCP: server name `codegraph`, query with `mcp__codegraph__*` tools
- GitNexus MCP: server name `gitnexus`
- Obsidian vault folder: `{repo_name}/`
- State file: `{project_path}/.kg-setup-state.json`
- Reindex: set `KG_SETUP_REFRESH=1` and re-run init scripts
"""
    _atomic_write(target, content)
    return "written"


def _atomic_write(path: Path, content: str) -> None:
    tmp = path.with_suffix(path.suffix + ".tmp")
    tmp.write_text(content)
    os.replace(tmp, path)


if __name__ == "__main__":
    if len(sys.argv) < 4:
        print("usage: update_memory_index.py <project_path> <repo_name> <date_str>",
              file=sys.stderr)
        sys.exit(2)
    project_path, repo_name, date_str = sys.argv[1], sys.argv[2], sys.argv[3]
    mem = memory_dir_for(project_path)
    print("pointer:", append_pointer(mem, repo_name, date_str))
    print("file:", write_memory_file(mem, repo_name, project_path, date_str))

Step 4: Run tests — verify pass

cd ~/Projects/kg-setup && python3 -m pytest tests/test_update_memory_index.py -v

Expected: 6 passed.

Step 5: Commit

git add scripts/update_memory_index.py \
        tests/test_update_memory_index.py
git commit -m "kg-setup: add update_memory_index.py with tests"

Task 12: Run full test suite

Files:

None to modify; verification only.
Step 1: Run all Python tests

cd ~/Projects/kg-setup && python3 -m pytest tests/ -v

Expected: all tests from Tasks 2, 3, 9, 10, 11 pass (28 total).

Step 2: Run all shell tests

tests/test_check_prereqs.sh

Expected: PASS: check_prereqs.sh.

Step 3: Commit nothing (verification-only task, no code changes)

If coverage is green, move on. If any test fails, fix it before proceeding to Phase D.

Phase D — Orchestrator + integration (Tasks 13-14)

Task 13: Write `SKILL.md` orchestrator

Files:

Create: SKILL.md
Step 1: Write SKILL.md

Write SKILL.md:

---
name: kg-setup
description: Bootstrap a 4-layer project memory system (CodeGraph + GitNexus MCP servers, project CLAUDE.md section, Obsidian vault folder with _index, auto-memory pointer). Use when user asks to "настрой граф знаний", "подключи базу знаний", "запомни проект", "setup knowledge graph", "bootstrap project memory", "initialize code graph", or "пусть ты помнишь детали проекта".
---

# kg-setup

You bootstrap a 4-layer project memory system in the user's current project directory: a local code knowledge graph (CodeGraph + GitNexus), a `## Knowledge Graph` section in the project's `CLAUDE.md`, a folder in the user's Obsidian vault with a seed `_index.md`, and a pointer entry in the user's auto-memory.

All mechanical work is delegated to scripts in `~/.claude/scripts/` (installation path via symlink — see Task 15 of the plan). Do not re-implement any logic — invoke the scripts.

## Phases

Run these phases in order. Each phase is a single logical unit. Stop and report to the user if a phase produces a blocking error.

### Phase 1 — DETECT

Execute in parallel and collect JSON outputs:

- `bash ~/.claude/scripts/check_prereqs.sh` → `prereqs` JSON
- `python3 ~/.claude/scripts/detect_project.py "$PWD"` → `project` JSON

Then read (best-effort):
- `./CLAUDE.md` (if exists) — first 40 lines only
- `./.kg-setup-state.json` (if exists) — full contents

Form an in-memory `state` dict with these four sections.

### Phase 2 — PLAN

Inspect `state` and construct the action list:

1. If `prereqs.errors` is non-empty (e.g. no Node, Node < 18) → STOP. Report errors to user, do not proceed.
2. If `.kg-setup-state.json` exists and reports `last_run_status=healthy`:
   - Ask the user: "Проект уже настроен (последний запуск: {last_run}). Что делать? (r) только health-check / (f) full refresh / (s) только Obsidian layer / (q) отмена". Default to (r) if no response.
   - If (q) → exit gracefully
   - If (r) → skip to Phase 4
   - If (f) → set env `KG_SETUP_REFRESH=1` for this run; proceed through all steps
   - If (s) → only run Obsidian + memory steps
3. Otherwise, the full per-project pipeline.

### Phase 3 — EXECUTE

Run in order. After each step, update the state dict; save state after the pipeline finishes (even on partial failure — mark `status=incomplete`).

#### 3a. One-time (only if missing per prereqs):

```
bash ~/.claude/scripts/install_tools.sh
bash ~/.claude/scripts/register_mcp.sh
```

Ask for user confirmation before running `install_tools.sh` on first run. Phrase: "Я установлю globally две npm-пакета: `gitnexus` и `@colbymchenry/codegraph`. OK?"

#### 3b. Per-project:

```
KG_SETUP_REFRESH=${refresh:-0} bash ~/.claude/scripts/init_codegraph.sh
KG_SETUP_REFRESH=${refresh:-0} bash ~/.claude/scripts/init_gitnexus.sh
```

If CLAUDE.md has uncommitted changes (check via `git status --porcelain CLAUDE.md` — non-empty = dirty) → ASK user to commit/stash before proceeding. Do not modify a dirty CLAUDE.md.

```
python3 ~/.claude/scripts/merge_claude_md.py \
  ./CLAUDE.md \
  "$VAULT_ROOT" \
  "$REPO_NAME" \
  "$SLUG"
```

#### 3c. Obsidian layer (via Obsidian MCP):

Build `_index.md` content:
```
echo "$OBSIDIAN_INPUT_JSON" | python3 ~/.claude/scripts/build_obsidian_index.py
```

Where `OBSIDIAN_INPUT_JSON` contains: `repo_name`, `project_path`, `primary_lang`, `loc`, `has_tests` (bool: does `./tests/` exist), `has_docs` (bool: does `./docs/` exist), and optionally `mode` ("minimal" or "rich" if user asked explicitly).

Then:
- Use Obsidian MCP `list_directory` to check if `{repo_name}/` exists in vault. If not, MCP's `write_note` to `{repo_name}/_index.md` will create it.
- Use `write_note` to place the rendered content at `{repo_name}/_index.md`. If the note already exists, do NOT overwrite unless user explicitly said "refresh obsidian" — instead, warn and skip.

#### 3d. Auto-memory layer:

```
python3 ~/.claude/scripts/update_memory_index.py \
  "$PROJECT_PATH" \
  "$REPO_NAME" \
  "$(date -u +%Y-%m-%d)"
```

#### 3e. Write state + gitignore:

- Save state via `state.py` (use python inline: `python3 -c "from state import State; s = State(Path('.')); s.load(); s.status='healthy'; s.layers={...}; s.save()"`). Prefer writing a small helper call rather than inline if the string grows.
- Ensure `.kg-setup-state.json` is in `.gitignore` (append if missing).

### Phase 4 — VERIFY

```
bash ~/.claude/scripts/health_check.sh "$PWD"
```

If exit code is non-zero, mark `state.status=degraded` and include which checks failed in the report.

### Phase 5 — REPORT

Print a user-facing bulleted summary:

```
kg-setup complete for <repo_name>
  ✓ GitNexus + CodeGraph installed
  ✓ MCP servers registered (gitnexus, codegraph)
  ✓ .codegraph/ initialized (python, 4200 LOC)
  ✓ .gitnexus/ initialized
  ✓ CLAUDE.md — added "## Knowledge Graph" section
  ✓ Obsidian: {repo_name}/_index.md created (minimal mode)
  ✓ auto-memory: pointer added to MEMORY.md
  ✓ Health check: 6/6 passed

Next: ask me any architectural question about the project — I'll query
the graph via mcp__codegraph__* instead of reading files.
```

For any step that was skipped or failed, use the emoji `⊘` (skipped) or `✗` (failed) with the reason.

## Idempotency contract

- Re-running on a healthy project with default answer "r" must produce zero file writes.
- Re-running with "f" must re-run init scripts (fresh indices) and refresh generated blocks, preserving user-content tails.
- Scripts in `scripts/` must each be individually safe to re-run.

## Error handling

- **Blocking** (no Node, Node < 18, no git) → stop in Phase 2, print command the user should run (`brew install node`, etc.), do not touch disk.
- **Recoverable** (npm flake) → scripts retry once internally; if still failing → surface to user with the stderr and next command.
- **Degraded** (no Obsidian vault, unsupported language for codegraph) → skip that step, continue, mark in state.warnings.
- **User conflict** (CLAUDE.md dirty, foreign `## Knowledge Graph` section, existing Obsidian `_index.md`) → stop the specific step, ask user, proceed with their choice.

Step 2: Commit

git add SKILL.md
git commit -m "kg-setup: add SKILL.md orchestrator"

Task 14: Integration smoke test

Files:

Create: tests/integration_test.sh
Step 1: Write integration test

Write tests/integration_test.sh:

#!/usr/bin/env bash
# integration_test.sh — end-to-end smoke test of per-project scripts
# on a synthetic temp project. Does NOT test install/register/Obsidian
# (those require real external state). Covers: detect + init_codegraph
# (skipped branch) + merge_claude_md + state.
set -euo pipefail

SKILL_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
WORK=$(mktemp -d -t kg-setup-integ-XXXX)
trap 'rm -rf "$WORK"' EXIT

cd "$WORK"
git init -q -b main
git remote add origin git@example.com:test/smoke.git
echo "print('hi')" > app.py
echo "flask" > requirements.txt
git add . && git -c user.email=t@t -c user.name=t commit -q -m "init"

echo "--- detect_project ---"
python3 "$SKILL_ROOT/scripts/detect_project.py" "$WORK"

echo "--- merge_claude_md (create) ---"
python3 "$SKILL_ROOT/scripts/merge_claude_md.py" \
  "$WORK/CLAUDE.md" "/vault" "smoke" "-tmp-smoke"
grep -q "## Knowledge Graph" "$WORK/CLAUDE.md"
grep -q "kg-setup-v1" "$WORK/CLAUDE.md"

echo "--- merge_claude_md (idempotent) ---"
cp "$WORK/CLAUDE.md" "$WORK/CLAUDE.md.before"
python3 "$SKILL_ROOT/scripts/merge_claude_md.py" \
  "$WORK/CLAUDE.md" "/vault" "smoke" "-tmp-smoke"
diff -q "$WORK/CLAUDE.md.before" "$WORK/CLAUDE.md"

echo "--- state write + read ---"
python3 -c "
import sys
sys.path.insert(0, '$SKILL_ROOT/scripts')
from pathlib import Path
from state import State
s = State(Path('$WORK'))
s.status = 'healthy'
s.layers = {'code_graph': {'configured': True}}
s.save()
s2 = State(Path('$WORK'))
s2.load()
assert s2.status == 'healthy', s2.status
assert s2.layers['code_graph']['configured'] is True
print('state roundtrip OK')
"

echo "--- health_check (expect exit 1, no setup) ---"
if "$SKILL_ROOT/scripts/health_check.sh" "$WORK" >/dev/null; then
  echo "FAIL: expected non-zero exit from health_check on unsetup project"
  exit 1
fi

echo "ALL INTEGRATION CHECKS PASSED"

Make executable:

chmod +x tests/integration_test.sh

Step 2: Run it

tests/integration_test.sh

Expected: ALL INTEGRATION CHECKS PASSED.

Step 3: Commit

git add tests/integration_test.sh
git commit -m "kg-setup: add integration smoke test for per-project pipeline"

Phase E — Install + real-world smoke (Tasks 15-17)

Task 15: Symlink into `~/.claude/skills/`

Files:

External: ~/.claude/skills/kg-setup symlink
Step 1: Create the symlink

ln -sfn "$(pwd)" "$HOME/.claude/skills/kg-setup"
ls -la "$HOME/.claude/skills/kg-setup"

Expected: symlink pointing to ~/Projects/kg-setup.

Step 2: Verify Claude Code picks it up

In a fresh Claude Code session (or the current one — the skill list reloads dynamically), check that kg-setup appears in the available skills list. The user verifies manually.

Step 3: Commit nothing (external action, not repo)

Task 16: End-to-end smoke on a scratch project

Files:

None in repo; external verification only.
Step 1: Create a scratch project

mkdir -p /tmp/kg-e2e && cd /tmp/kg-e2e
git init -q -b main
git remote add origin git@example.com:test/kg-e2e.git
echo "def add(a, b): return a + b" > calc.py
echo "flask" > requirements.txt
git add . && git commit -qm "init"

Step 2: Trigger the skill via Claude Code

In Claude Code (with cwd = /tmp/kg-e2e): say "настрой граф знаний для этого проекта". Claude should pick up kg-setup, run through phases, ask install confirmation, then complete.

Step 3: Verify artifacts

ls -la /tmp/kg-e2e/
# Expect: .codegraph/, .gitnexus/, CLAUDE.md, .kg-setup-state.json
cat /tmp/kg-e2e/.kg-setup-state.json
# Expect: last_run_status: "healthy"
grep "kg-setup-v1" /tmp/kg-e2e/CLAUDE.md
# Expect: marker present
claude mcp list | grep -E 'codegraph|gitnexus'
# Expect: both rows

Step 4: Re-run and verify idempotency

Say: "настрой граф знаний" again. Claude should detect existing state, ask (r/f/s/q), default (r), run health check, print report. No file diffs.

Step 5: Cleanup

rm -rf /tmp/kg-e2e

Task 17: Smoke on arb-scanner itself + final commit

Files:

Potentially modifies ./CLAUDE.md, .kg-setup-state.json (gitignored), .gitignore. Verification uses pre-existing project.
Step 1: Ensure arb-scanner CLAUDE.md is committed

git status --porcelain CLAUDE.md

Expected: empty output (no uncommitted changes). If there are, commit or stash them first — the skill will refuse to write a dirty CLAUDE.md.

Step 2: Trigger the skill on arb-scanner

Say to Claude: "настрой граф знаний для этого проекта".

Step 3: Verify artifacts

ls -la .codegraph/ .gitnexus/ .kg-setup-state.json
grep "kg-setup-v1" CLAUDE.md

Step 4: Verify Obsidian side

Via Obsidian MCP list_directory arb-scanner/ — expect _index.md now in the folder alongside the existing 15 notes.

Step 5: Verify auto-memory pointer

cat ~/.claude/projects/-Users-pavelmalkin-Documents-Scaner/memory/MEMORY.md | tail -5
ls ~/.claude/projects/-Users-pavelmalkin-Documents-Scaner/memory/project_kg_arb-scanner.md

Expected: new pointer line, new memory file.

Step 6: Manual token-usage smoke

Ask me 3 architectural questions about arb-scanner ("describe matcher.py's arb detection", "which BKs are configured for proxy routing", "how is the scan loop started"). Confirm I use mcp__codegraph__* tool calls instead of reading source files. Note: this is a qualitative check, not a hard SLA.

Step 7: Commit the resulting CLAUDE.md change

git add CLAUDE.md
git commit -m "CLAUDE.md: add kg-setup Knowledge Graph section"

Note: .codegraph/, .gitnexus/, and .kg-setup-state.json are all gitignored (the skill appends them to .gitignore in Phase 3e if missing).

Self-review checklist (performed before handoff)

Every spec section maps to ≥1 task: Phase A covers §3.1 layers + §4.11 state; Phase B covers §4.3, §4.4, §4.10, §8; Phase C covers §4.5, §4.6, §4.7, §4.8, §4.9; Phase D covers §4.1 orchestrator + §7.2 integration; Phase E covers §9 install/deploy + §7.3 manual smoke.
No "TBD", "add error handling", "similar to Task N" placeholders.
Type/name consistency: MARKER_START, MARKER_END, USER_CONTENT_MARKER used consistently across merge_claude_md.py and its tests. State class, STATE_FILENAME, SCHEMA_VERSION consistent across state.py/tests. KG_SETUP_REFRESH env var naming consistent across scripts and SKILL.md.
Every code step includes the actual code, not a reference.
Every test step includes a specific run command + expected output.
Commit messages match user's existing style (short, imperative, prefixed by component).
Out-of-scope items (remote/SSH, cron re-index, chat→Obsidian) explicitly skipped in §10 of spec, not snuck back in.

Execution handoff

Plan complete and saved to docs/superpowers/plans/2026-04-22-kg-setup.md. Two execution options:

Subagent-Driven (recommended) — a fresh subagent is dispatched per task, the main session reviews between tasks. Fast iteration, clean context per task.
Inline Execution — tasks run in this session with checkpoints for review. More control, heavier on main-session context.

Which approach?

60 KiB Raw Permalink Blame History

kg-setup Implementation Plan

File Structure

Phase A — Foundation: state + detection (Tasks 1-4)

Task 1: Repo scaffolding

Activate

Design

Task 2: state.py — state file I/O

Task 3: detect_project.py — read-only project inspection

Task 4: check_prereqs.sh — env inspection

Phase B — System plumbing: install + MCP register + health (Tasks 5-7)

Task 5: install_tools.sh — install GitNexus + CodeGraph

Task 6: register_mcp.sh — add MCP servers to Claude Code

Task 7: health_check.sh — post-setup verification

Phase C — Per-project logic (Tasks 8-12)

Task 8: init_codegraph.sh and init_gitnexus.sh

Task 9: merge_claude_md.py — critical, fully TDD

Task 10: build_obsidian_index.py — generate _index.md content

Task 11: update_memory_index.py — auto-memory pointer

Task 12: Run full test suite

Phase D — Orchestrator + integration (Tasks 13-14)

Task 13: Write SKILL.md orchestrator

Task 14: Integration smoke test

Phase E — Install + real-world smoke (Tasks 15-17)

Task 15: Symlink into ~/.claude/skills/

Task 16: End-to-end smoke on a scratch project

Task 17: Smoke on arb-scanner itself + final commit

Self-review checklist (performed before handoff)

Execution handoff

60 KiB

Raw Permalink Blame History

Task 2: `state.py` — state file I/O

Task 3: `detect_project.py` — read-only project inspection

Task 4: `check_prereqs.sh` — env inspection

Task 5: `install_tools.sh` — install GitNexus + CodeGraph

Task 6: `register_mcp.sh` — add MCP servers to Claude Code

Task 7: `health_check.sh` — post-setup verification

Task 8: `init_codegraph.sh` and `init_gitnexus.sh`

Task 9: `merge_claude_md.py` — critical, fully TDD

Task 10: `build_obsidian_index.py` — generate `_index.md` content

Task 11: `update_memory_index.py` — auto-memory pointer

Task 13: Write `SKILL.md` orchestrator

Task 15: Symlink into `~/.claude/skills/`