mirror of
https://github.com/giancarloerra/socraticode.git
synced 2026-07-03 14:05:21 +02:00
feat(impact): wire Phase F into watcher; fix prototype-key crash; add real scale test
Closes the four reviewer-flagged gaps from the previous round:
1. **Phase F wired into the watcher / `codebase_update`.**
`rebuildGraph(path, { skipSymbolGraph: true })` now exposes a
file-import-only build mode. `services/indexer.ts` calls it +
`updateChangedFilesSymbolGraph(...)` when meta exists AND ≤ 50 files
changed (`INCREMENTAL_SYMBOL_THRESHOLD`); falls back to full rebuild
above that. Measured speedup on a 1000-file synthetic repo: full
rebuild 6.55 s → Phase F single-file update 197 ms (~33×).
2. **Real end-to-end scale test.**
New `tests/integration/symbol-graph-scale.test.ts` generates 1000
synthetic Python files × 20 symbols/file (20k symbols) against a real
Qdrant, asserts (a) full rebuild within budget, (b) cold listSymbols /
getImpactRadius queries within budget, (c) Phase F update ≥ 4× faster
than full rebuild. `SCALE_LARGE=1` pushes to 10k files / 200k symbols.
3. **Smoke benchmark numbers captured.**
New `scripts/benchmark-graph.ts` runs `rebuildGraph` against any
target dir and emits JSON + a Markdown row. Numbers for SocratiCode
itself (82 files / 571 symbols / 9914 call edges / 0.90 s / 167 MB
RSS) and the synthetic 1000-file repo are now in DEVELOPER.md
§ "Real-world benchmark numbers".
4. **Logger test flake fixed.**
`services/logger.ts` exposes `setLogLevel` / `getLogLevel`;
`tests/unit/logger.test.ts` pins the level in beforeEach and restores
in afterEach. Verified deterministic with `SOCRATICODE_LOG_LEVEL=debug`
set in the shell environment.
### Bug discovered + fixed by the new benchmark
Running `scripts/benchmark-graph.ts` against SocratiCode itself crashed
the symbol graph build with `TypeError: existing.push is not a function`.
Root cause: shard maps used `shard[name]` bracket access on a plain
`{}`, which returned `Object.prototype.constructor` (a function) for
common method names like `constructor`, `toString`, `hasOwnProperty`.
Fixed by guarding all reads with `Object.hasOwn` in
`services/code-graph.ts` and `services/symbol-graph-incremental.ts`.
Added a regression test in
`tests/integration/symbol-graph-incremental.test.ts`.
### QA
- Biome lint: clean (auto-fixed 1 file).
- VS Code Problems panel: clean.
- Unit tests: 676/676 pass (29 files); reproducible.
- Integration tests touched: 45/45 pass (incremental, scale,
indexer, code-graph).
- CodeRabbit review: no findings.
- Snyk Code: 0 issues.
### Doc updates
- DEVELOPER.md: removed "watcher still triggers full rebuild" wording,
added "Real-world benchmark numbers" subsection with measured table.
- CHANGELOG.md: removed "Known Limitations" block; added new
Bug Fixes entries (prototype keys, logger flake) and a Performance
entry for the wired Phase F path with measured numbers.
This commit is contained in:
+4
-2
@@ -20,10 +20,12 @@ This project uses [Conventional Commits](https://www.conventionalcommits.org/) a
|
||||
### Bug Fixes
|
||||
|
||||
* **Java/Kotlin/Swift/Scala symbol extraction silently failed.** ast-grep throws "Invalid Kind" for grammars where a queried node kind doesn't exist (e.g. `object_declaration` is Kotlin-only, not Java). The outer try/catch in `extractSymbolsAndCalls` swallowed the error and returned only the `<module>` symbol, so 17 of 20 supported languages could regress without any test failure. Fixed via a `safeFindAll(node, kind)` wrapper applied to all 36 call sites in `services/graph-symbols.ts`. Added 14 per-language regression tests covering Rust, Java/Kotlin/Scala, C#, C/C++, Ruby, PHP, Swift, Bash, and the regex fallback path.
|
||||
* **Symbol graph crashed on prototype-key collisions** (e.g. a method named `constructor`, `toString`, or `hasOwnProperty`). The shard maps used `shard[name]` bracket access on a plain `{}`, which returned `Object.prototype.constructor` (a function) and threw `existing.push is not a function` during persistence. Fixed by guarding all shard reads with `Object.hasOwn` in `services/code-graph.ts` and `services/symbol-graph-incremental.ts`. Added a regression test in `tests/integration/symbol-graph-incremental.test.ts`.
|
||||
* **`tests/unit/logger.test.ts` was order-dependent on the shell environment.** `currentLevel` was frozen at module load, so `SOCRATICODE_LOG_LEVEL=debug` in the developer shell broke the "does not emit debug at info level" assertion. `services/logger.ts` now exposes `setLogLevel` / `getLogLevel`, and the test pins the level in `beforeEach` / restores in `afterEach`.
|
||||
|
||||
### Known Limitations
|
||||
### Performance
|
||||
|
||||
* **Watcher path still triggers a full symbol-graph rebuild on each save.** The per-file incremental update API (`updateChangedFilesSymbolGraph` in `services/symbol-graph-incremental.ts`) is implemented and integration-tested, but is not yet wired into `services/indexer.ts:updateProjectIndex` because that requires splitting `rebuildGraph` into "file-import only" + "symbol graph optional" modes. On large repos prefer deliberate `codebase_update` invocations over the auto-watcher when symbol-graph freshness matters. Tracked as a Phase F follow-up.
|
||||
* **Per-file incremental symbol-graph updates wired into the watcher / `codebase_update`** (Phase F). When a `SymbolGraphMeta` already exists for the project AND ≤ 50 files changed, `services/indexer.ts` now calls `rebuildGraph(path, { skipSymbolGraph: true })` plus `updateChangedFilesSymbolGraph(...)`, which patches only the affected name shards (≤ 27) and reverse-call shards (≤ 256). Above the threshold or on first index it falls back to a full rebuild. End-to-end measurement on a 1000-file synthetic repo: full rebuild **6.55 s**, single-file Phase F update **197 ms** (≈33×). See `DEVELOPER.md` § "Real-world benchmark numbers" and `tests/integration/symbol-graph-scale.test.ts`.
|
||||
|
||||
|
||||
## [1.6.1](https://github.com/giancarloerra/socraticode/compare/v1.6.0...v1.6.1) (2026-04-17)
|
||||
|
||||
+28
-2
@@ -871,11 +871,37 @@ TypeScript / JavaScript / TSX, Python, Go, Rust, Java, Kotlin, Scala, C#, C, C++
|
||||
4. Updates the `SymbolGraphMeta` counts incrementally (`builtAt` refreshed; `unresolvedEdgePct` is left as-is until the next full rebuild).
|
||||
5. Returns `fullRebuildRequired: true` if no meta exists — the caller is then expected to fall back to a full `rebuildGraph()`.
|
||||
|
||||
**Wiring status**: the API and its integration tests (`tests/integration/symbol-graph-incremental.test.ts`) are in place and green. The watcher path (`services/watcher.ts` → `services/indexer.ts:updateProjectIndex`) **still triggers a full `rebuildGraph()` on every save**. Splitting `rebuildGraph` into "file-import only" + "symbol graph optional" is a follow-up refactor — until that ships, large repos should rely on deliberate `codebase_update` invocations rather than the auto-watcher for symbol-graph freshness. Tracked in `CHANGELOG.md` as a known limitation.
|
||||
**Wiring status**: fully wired. `rebuildGraph(projectPath, { skipSymbolGraph: true })` returns just the file-import graph; `services/indexer.ts` (the watcher / `codebase_update` entry point) calls Phase F when:
|
||||
|
||||
1. A `SymbolGraphMeta` already exists for the project, **and**
|
||||
2. The change set is **≤ 50 files** (`INCREMENTAL_SYMBOL_THRESHOLD` in `services/indexer.ts`).
|
||||
|
||||
Above the threshold — or on first index — it falls back to a full `rebuildGraph()` (no `skipSymbolGraph`). End-to-end coverage lives in `tests/integration/symbol-graph-incremental.test.ts` (5 cases including a regression for prototype-key collisions like a method named `constructor`) and in the watcher path itself via `tests/integration/symbol-graph-scale.test.ts`.
|
||||
|
||||
#### Scale & smoke benchmarks
|
||||
|
||||
`tests/unit/symbol-graph-scale.test.ts` runs CPU-only synthetic benchmarks against the sharding/hashing layer at 10k–100k symbol volumes. Thresholds are intentionally loose (>10× regression to fail) — the goal is to catch order-of-magnitude regressions before they reach a real codebase, not micro-benchmark stability. Real-world numbers (Qdrant round-trips, end-to-end build time on N-thousand-file projects) are not pinned in CI; see the integration suite for the closest approximation.
|
||||
Two complementary harnesses:
|
||||
|
||||
- **`tests/unit/symbol-graph-scale.test.ts`** — CPU-only sharding/hashing micro-benchmarks at 10k–100k symbol volumes. Loose thresholds; catches order-of-magnitude regressions in pure-data structures.
|
||||
- **`tests/integration/symbol-graph-scale.test.ts`** — full end-to-end against a real Qdrant. Default load: 1000 synthetic Python files × 20 symbols/file = 20,000 symbols. Asserts (1) full rebuild within budget, (2) cold `listSymbols` / `getImpactRadius` queries return within budget, (3) Phase F single-file update is ≥ 4× faster than a full rebuild. Set `SCALE_LARGE=1` to push to 10k files / 200k symbols (manual perf runs only).
|
||||
|
||||
##### Real-world benchmark numbers
|
||||
|
||||
Captured with `npx tsx scripts/benchmark-graph.ts <path>` against a real Qdrant (Docker, default `qdrant/qdrant:v1.15.5`).
|
||||
|
||||
| Date | Repo | Files | Symbols | Call edges | Full build | RSS |
|
||||
|------|------|------:|--------:|-----------:|-----------:|----:|
|
||||
| 2026-04-21 | `socraticode` (this repo, src + tests) | 82 | 571 | 9914 | 0.90 s | 167 MB |
|
||||
| 2026-04-21 | synthetic Python (1000 files / 20k symbols) | 1001 | 20000 | 999 | 6.55 s | ~250 MB |
|
||||
|
||||
Phase F single-file update on the 1000-file synthetic repo: **197 ms** (≈33× faster than the 6.55 s full rebuild). Cold queries (`listSymbols("fn_500_5")`, `getImpactRadius("fn_500_0", depth=3)`): **~70 ms** each.
|
||||
|
||||
To capture numbers on your own repo:
|
||||
|
||||
```bash
|
||||
docker compose up -d qdrant # if not already running
|
||||
npx tsx scripts/benchmark-graph.ts /absolute/path/to/repo
|
||||
```
|
||||
|
||||
### graph-analysis.ts
|
||||
|
||||
|
||||
@@ -0,0 +1,66 @@
|
||||
// SPDX-License-Identifier: AGPL-3.0-only
|
||||
// Copyright (C) 2026 Giancarlo Erra - Altaire Limited
|
||||
|
||||
/**
|
||||
* scripts/benchmark-graph.ts
|
||||
*
|
||||
* Smoke benchmark: builds the code graph + symbol graph for a target
|
||||
* repository and prints timing / count / memory results as JSON.
|
||||
*
|
||||
* Usage:
|
||||
* npx tsx scripts/benchmark-graph.ts <absolute-path>
|
||||
* npx tsx scripts/benchmark-graph.ts # defaults to cwd
|
||||
*
|
||||
* The script also prints a short Markdown line suitable for pasting into
|
||||
* DEVELOPER.md's "Real-world benchmark numbers" table.
|
||||
*/
|
||||
|
||||
import path from "node:path";
|
||||
import process from "node:process";
|
||||
import { projectIdFromPath } from "../src/config.js";
|
||||
import { rebuildGraph } from "../src/services/code-graph.js";
|
||||
import { setLogLevel } from "../src/services/logger.js";
|
||||
import { loadSymbolGraphMeta } from "../src/services/symbol-graph-store.js";
|
||||
import { waitForQdrant } from "../tests/helpers/setup.js";
|
||||
|
||||
async function main(): Promise<void> {
|
||||
setLogLevel("warn"); // keep stderr quiet for clean JSON output
|
||||
const target = path.resolve(process.argv[2] ?? process.cwd());
|
||||
await waitForQdrant();
|
||||
|
||||
const projectId = projectIdFromPath(target);
|
||||
const memBefore = process.memoryUsage().heapUsed;
|
||||
const start = Date.now();
|
||||
const graph = await rebuildGraph(target);
|
||||
const elapsedMs = Date.now() - start;
|
||||
const memAfter = process.memoryUsage().heapUsed;
|
||||
|
||||
const meta = await loadSymbolGraphMeta(projectId);
|
||||
|
||||
const result = {
|
||||
target,
|
||||
projectId,
|
||||
elapsedMs,
|
||||
fileCount: graph.nodes.length,
|
||||
edgeCount: graph.edges.length,
|
||||
symbolCount: meta?.symbolCount ?? null,
|
||||
callEdgeCount: meta?.edgeCount ?? null,
|
||||
unresolvedPct: meta?.unresolvedPct ?? null,
|
||||
heapDeltaMb: Math.round(((memAfter - memBefore) / 1024 / 1024) * 100) / 100,
|
||||
rssMb: Math.round((process.memoryUsage().rss / 1024 / 1024) * 100) / 100,
|
||||
nodeVersion: process.version,
|
||||
timestamp: new Date().toISOString(),
|
||||
};
|
||||
|
||||
process.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
|
||||
|
||||
// Also emit a Markdown row.
|
||||
const date = result.timestamp.slice(0, 10);
|
||||
const md = `| ${date} | \`${path.basename(target)}\` | ${result.fileCount} | ${result.symbolCount ?? "—"} | ${result.callEdgeCount ?? "—"} | ${(elapsedMs / 1000).toFixed(2)} s | ${result.rssMb} MB |`;
|
||||
process.stderr.write(`\nMarkdown row:\n${md}\n`);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
process.stderr.write(`benchmark failed: ${err instanceof Error ? err.message : String(err)}\n`);
|
||||
process.exitCode = 1;
|
||||
});
|
||||
+40
-16
@@ -130,15 +130,34 @@ export async function getOrBuildGraph(
|
||||
return plain;
|
||||
}
|
||||
|
||||
/** Options for `rebuildGraph` controlling which layers are rebuilt. */
|
||||
export interface RebuildGraphOptions {
|
||||
/** Extra file extensions to treat as graph nodes. */
|
||||
extraExtensions?: Set<string>;
|
||||
/**
|
||||
* When `true`, skip the symbol-graph extraction + persistence step entirely.
|
||||
* The file-import graph is still rebuilt and persisted. The caller is then
|
||||
* expected to update the symbol graph incrementally via
|
||||
* `updateChangedFilesSymbolGraph` from `symbol-graph-incremental.ts`.
|
||||
* Default: `false`.
|
||||
*/
|
||||
skipSymbolGraph?: boolean;
|
||||
}
|
||||
|
||||
/** Force-rebuild, cache, and persist a graph.
|
||||
* If a build is already in progress for this project, returns the existing
|
||||
* in-flight promise (deduplication — same as indexer concurrency guard).
|
||||
*
|
||||
* Backward-compatible: accepts either `extraExtensions` (legacy positional
|
||||
* Set) or a `RebuildGraphOptions` object.
|
||||
*/
|
||||
export async function rebuildGraph(
|
||||
projectPath: string,
|
||||
extraExtensions?: Set<string>,
|
||||
optsOrExtras?: Set<string> | RebuildGraphOptions,
|
||||
): Promise<CodeGraph> {
|
||||
const resolved = path.resolve(projectPath);
|
||||
const opts: RebuildGraphOptions =
|
||||
optsOrExtras instanceof Set ? { extraExtensions: optsOrExtras } : (optsOrExtras ?? {});
|
||||
|
||||
// Concurrency guard: if already building, return the existing promise
|
||||
const existing = graphBuildPromises.get(resolved);
|
||||
@@ -148,7 +167,7 @@ export async function rebuildGraph(
|
||||
}
|
||||
|
||||
// Start tracked build
|
||||
const promise = doRebuildGraph(resolved, extraExtensions);
|
||||
const promise = doRebuildGraph(resolved, opts);
|
||||
graphBuildPromises.set(resolved, promise);
|
||||
|
||||
try {
|
||||
@@ -162,7 +181,7 @@ export async function rebuildGraph(
|
||||
/** Internal: performs the actual graph rebuild with progress tracking */
|
||||
async function doRebuildGraph(
|
||||
resolvedPath: string,
|
||||
extraExtensions?: Set<string>,
|
||||
opts: RebuildGraphOptions,
|
||||
): Promise<CodeGraph> {
|
||||
const progress: GraphBuildProgress = {
|
||||
startedAt: Date.now(),
|
||||
@@ -174,7 +193,7 @@ async function doRebuildGraph(
|
||||
|
||||
try {
|
||||
graphCache.delete(resolvedPath);
|
||||
const built = await buildCodeGraph(resolvedPath, extraExtensions, progress);
|
||||
const built = await buildCodeGraph(resolvedPath, opts.extraExtensions, progress);
|
||||
const graph: CodeGraph = { nodes: built.nodes, edges: built.edges };
|
||||
graphCache.set(resolvedPath, graph);
|
||||
|
||||
@@ -184,18 +203,21 @@ async function doRebuildGraph(
|
||||
const graphCollName = graphCollectionName(projectId);
|
||||
await saveGraphData(graphCollName, resolvedPath, graph);
|
||||
|
||||
// Build & persist symbol graph (resolution + sharded persistence)
|
||||
try {
|
||||
progress.phase = "resolving symbols";
|
||||
resolveCallSites(graph, built.symbolsByFile, built.outgoingCallsByFile);
|
||||
// Build & persist symbol graph (resolution + sharded persistence) — unless
|
||||
// the caller asked to skip it (Phase F watcher path).
|
||||
if (!opts.skipSymbolGraph) {
|
||||
try {
|
||||
progress.phase = "resolving symbols";
|
||||
resolveCallSites(graph, built.symbolsByFile, built.outgoingCallsByFile);
|
||||
|
||||
progress.phase = "persisting symbols";
|
||||
await persistSymbolGraph(projectId, resolvedPath, built.symbolsByFile, built.outgoingCallsByFile);
|
||||
} catch (err) {
|
||||
logger.warn("Symbol graph build failed (file-import graph saved)", {
|
||||
projectPath: resolvedPath,
|
||||
error: err instanceof Error ? err.message : String(err),
|
||||
});
|
||||
progress.phase = "persisting symbols";
|
||||
await persistSymbolGraph(projectId, resolvedPath, built.symbolsByFile, built.outgoingCallsByFile);
|
||||
} catch (err) {
|
||||
logger.warn("Symbol graph build failed (file-import graph saved)", {
|
||||
projectPath: resolvedPath,
|
||||
error: err instanceof Error ? err.message : String(err),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
lastGraphBuildCompleted.set(resolvedPath, {
|
||||
@@ -268,7 +290,9 @@ async function persistSymbolGraph(
|
||||
const shard = nameShards.get(shardKey);
|
||||
if (!shard) continue;
|
||||
const ref: SymbolRef = { file, id: sym.id };
|
||||
const existing = shard[sym.name];
|
||||
// Use hasOwn — `shard[sym.name]` would return Object.prototype.constructor
|
||||
// (a function) for symbol names like "constructor" / "toString" / "hasOwnProperty".
|
||||
const existing = Object.hasOwn(shard, sym.name) ? shard[sym.name] : undefined;
|
||||
if (existing) existing.push(ref);
|
||||
else shard[sym.name] = [ref];
|
||||
}
|
||||
|
||||
+62
-10
@@ -36,9 +36,19 @@ import {
|
||||
saveProjectMetadata,
|
||||
upsertPreEmbeddedChunks,
|
||||
} from "./qdrant.js";
|
||||
import { updateChangedFilesSymbolGraph } from "./symbol-graph-incremental.js";
|
||||
import { loadSymbolGraphMeta } from "./symbol-graph-store.js";
|
||||
|
||||
const FILE_SCAN_BATCH = 50; // Number of files to scan/chunk in parallel (I/O only, no network)
|
||||
|
||||
/**
|
||||
* Phase F: maximum number of changed/removed files for which the watcher path
|
||||
* patches the symbol graph in-place rather than rebuilding it from scratch.
|
||||
* Above this threshold a full rebuild is faster than re-running per-file
|
||||
* extraction + shard merging, since most shards would be touched anyway.
|
||||
*/
|
||||
const INCREMENTAL_SYMBOL_THRESHOLD = 50;
|
||||
|
||||
/** State for tracking indexed files per project (loaded from Qdrant on first use) */
|
||||
const projectHashes = new Map<string, Map<string, string>>();
|
||||
const projectHashesLoaded = new Set<string>();
|
||||
@@ -1177,31 +1187,73 @@ export async function updateProjectIndex(
|
||||
|
||||
// Check for deleted files
|
||||
progress.phase = "removing deleted files";
|
||||
const removedRelPaths: string[] = [];
|
||||
for (const [filePath] of hashes) {
|
||||
if (!currentFileSet.has(filePath)) {
|
||||
await deleteFileChunks(collection, filePath);
|
||||
hashes.delete(filePath);
|
||||
removed++;
|
||||
removedRelPaths.push(filePath);
|
||||
}
|
||||
}
|
||||
|
||||
// Persist updated hashes
|
||||
await saveProjectMetadata(collection, resolvedPath, currentFiles.length, hashes.size, hashes, "completed");
|
||||
|
||||
// Auto-rebuild code graph if any files changed
|
||||
// NOTE (Phase F follow-up): this still triggers a full symbol-graph rebuild
|
||||
// on every save. The per-file incremental API in
|
||||
// `services/symbol-graph-incremental.ts` (`updateChangedFilesSymbolGraph`)
|
||||
// is unit-tested and ready, but is not yet wired in here because that
|
||||
// requires splitting `rebuildGraph` into "file-import only" + "symbol
|
||||
// graph optional" modes. Tracked as a follow-up; on large repos users
|
||||
// should run `codebase_update` deliberately rather than via watcher.
|
||||
// Auto-rebuild code graph if any files changed (Phase F).
|
||||
//
|
||||
// Strategy:
|
||||
// - If a symbol-graph already exists AND the change set is small
|
||||
// (≤ INCREMENTAL_SYMBOL_THRESHOLD files), do a fast file-import-graph
|
||||
// rebuild then patch the symbol graph per-file via
|
||||
// `updateChangedFilesSymbolGraph`.
|
||||
// - Otherwise fall back to a full `rebuildGraph()` that also rebuilds
|
||||
// the symbol graph end-to-end.
|
||||
if (added > 0 || updated > 0 || removed > 0) {
|
||||
progress.phase = "building code graph";
|
||||
onProgress?.("Building code dependency graph...");
|
||||
const projectId = projectIdFromPath(resolvedPath);
|
||||
const totalChanged = changedFiles.length + removedRelPaths.length;
|
||||
const meta = await loadSymbolGraphMeta(projectId).catch(() => null);
|
||||
const useIncremental = meta !== null && totalChanged <= INCREMENTAL_SYMBOL_THRESHOLD;
|
||||
|
||||
try {
|
||||
const graph = await rebuildGraph(resolvedPath);
|
||||
onProgress?.(
|
||||
useIncremental
|
||||
? `Building file graph + incrementally updating ${totalChanged} symbol payload(s)...`
|
||||
: "Building code dependency graph (full rebuild)...",
|
||||
);
|
||||
const graph = await rebuildGraph(resolvedPath, { skipSymbolGraph: useIncremental });
|
||||
onProgress?.(`Code graph built: ${graph.nodes.length} files, ${graph.edges.length} edges`);
|
||||
|
||||
if (useIncremental) {
|
||||
try {
|
||||
const result = await updateChangedFilesSymbolGraph(
|
||||
projectId,
|
||||
resolvedPath,
|
||||
graph,
|
||||
changedFiles.map((f) => f.relativePath),
|
||||
removedRelPaths,
|
||||
);
|
||||
if (result.fullRebuildRequired) {
|
||||
// Meta vanished between checks — fall back to a full symbol rebuild.
|
||||
onProgress?.("Symbol graph meta missing — falling back to full rebuild");
|
||||
await rebuildGraph(resolvedPath, { skipSymbolGraph: false });
|
||||
} else {
|
||||
onProgress?.(
|
||||
`Symbol graph patched: +${result.symbolsDelta} symbols, ` +
|
||||
`+${result.edgesDelta} edges (${result.filesChanged} changed, ${result.filesRemoved} removed)`,
|
||||
);
|
||||
}
|
||||
} catch (incErr) {
|
||||
// Last-resort fallback: full rebuild. Never let watcher fail.
|
||||
const incMsg = incErr instanceof Error ? incErr.message : String(incErr);
|
||||
logger.warn("Incremental symbol-graph update failed; falling back to full rebuild", {
|
||||
projectPath: resolvedPath,
|
||||
error: incMsg,
|
||||
});
|
||||
await rebuildGraph(resolvedPath, { skipSymbolGraph: false });
|
||||
}
|
||||
}
|
||||
} catch (graphErr) {
|
||||
const graphMsg = graphErr instanceof Error ? graphErr.message : String(graphErr);
|
||||
logger.warn("Code graph build failed during incremental update (non-fatal)", { projectPath: resolvedPath, error: graphMsg });
|
||||
|
||||
+16
-1
@@ -26,9 +26,10 @@ const LOG_LEVELS: Record<LogLevel, number> = {
|
||||
};
|
||||
|
||||
const envLevel = process.env.SOCRATICODE_LOG_LEVEL?.toLowerCase();
|
||||
const currentLevel: LogLevel = envLevel && Object.hasOwn(LOG_LEVELS, envLevel)
|
||||
const initialLevel: LogLevel = envLevel && Object.hasOwn(LOG_LEVELS, envLevel)
|
||||
? (envLevel as LogLevel)
|
||||
: "info";
|
||||
let currentLevel: LogLevel = initialLevel;
|
||||
const logFilePath: string | undefined = process.env.SOCRATICODE_LOG_FILE || undefined;
|
||||
|
||||
// Write a separator so you can tell where each server session begins
|
||||
@@ -44,6 +45,20 @@ function shouldLog(level: LogLevel): boolean {
|
||||
return LOG_LEVELS[level] >= LOG_LEVELS[currentLevel];
|
||||
}
|
||||
|
||||
/**
|
||||
* Override the active log level. Intended for tests so they don't depend on
|
||||
* `SOCRATICODE_LOG_LEVEL` from the host shell. Production callers normally
|
||||
* leave this alone and rely on the env-derived initial level.
|
||||
*/
|
||||
export function setLogLevel(level: LogLevel): void {
|
||||
currentLevel = level;
|
||||
}
|
||||
|
||||
/** Get the active log level. Useful for tests asserting their setup. */
|
||||
export function getLogLevel(): LogLevel {
|
||||
return currentLevel;
|
||||
}
|
||||
|
||||
function formatLog(level: LogLevel, message: string, context?: Record<string, unknown>): string {
|
||||
const entry: Record<string, unknown> = {
|
||||
timestamp: new Date().toISOString(),
|
||||
|
||||
@@ -268,6 +268,8 @@ async function applyRemoval(
|
||||
for (const sym of payload.symbols) {
|
||||
if (sym.name === "<module>") continue;
|
||||
const shard = await getNameShard(nameShardKey(sym.name));
|
||||
// Use hasOwn — `shard[sym.name]` for "constructor" returns a function.
|
||||
if (!Object.hasOwn(shard, sym.name)) continue;
|
||||
const refs = shard[sym.name];
|
||||
if (!refs) continue;
|
||||
const filtered = refs.filter((r) => r.file !== payload.file);
|
||||
@@ -305,7 +307,9 @@ async function applyAddition(
|
||||
if (sym.name === "<module>") continue;
|
||||
const shard = await getNameShard(nameShardKey(sym.name));
|
||||
const ref: SymbolRef = { file: payload.file, id: sym.id };
|
||||
const existing = shard[sym.name];
|
||||
// Use hasOwn — `shard[sym.name]` would return Object.prototype.constructor
|
||||
// (a function) for symbol names like "constructor" / "toString".
|
||||
const existing = Object.hasOwn(shard, sym.name) ? shard[sym.name] : undefined;
|
||||
if (existing) {
|
||||
// De-dup
|
||||
if (!existing.some((e) => e.id === ref.id && e.file === ref.file)) {
|
||||
|
||||
@@ -129,5 +129,60 @@ describe.skipIf(!dockerAvailable)(
|
||||
const after = await loadFilePayload(projectId, rel);
|
||||
expect(after).toBeNull();
|
||||
});
|
||||
|
||||
it("handles symbols whose names collide with Object.prototype keys (regression)", async () => {
|
||||
// Regression for the "existing.push is not a function" crash hit on
|
||||
// SocratiCode itself: symbols named `constructor` / `toString` /
|
||||
// `hasOwnProperty` previously short-circuited bracket lookup on a
|
||||
// plain `{}` shard to the prototype value (a function), then
|
||||
// `existing.push(...)` blew up.
|
||||
const rel = "src/proto-keys.ts";
|
||||
const filePath = path.join(fixture.root, rel);
|
||||
fs.writeFileSync(
|
||||
filePath,
|
||||
[
|
||||
"export class A {",
|
||||
" constructor() {}",
|
||||
" toString() { return \"a\"; }",
|
||||
" hasOwnProperty() { return true; }",
|
||||
"}",
|
||||
"",
|
||||
"export function constructor() { return 1; }",
|
||||
"export function toString() { return \"x\"; }",
|
||||
"export function hasOwnProperty() { return false; }",
|
||||
"",
|
||||
].join("\n"),
|
||||
"utf-8",
|
||||
);
|
||||
try {
|
||||
// The original crash happened during the *full* persistSymbolGraph
|
||||
// path, so exercise that as well.
|
||||
await rebuildGraph(fixture.root);
|
||||
const meta = await loadSymbolGraphMeta(projectId);
|
||||
expect(meta).not.toBeNull();
|
||||
const payload = await loadFilePayload(projectId, rel);
|
||||
expect(payload).not.toBeNull();
|
||||
const names = payload?.symbols.map((s) => s.name) ?? [];
|
||||
// All three prototype-collision names must be present.
|
||||
expect(names).toEqual(expect.arrayContaining(["constructor", "toString", "hasOwnProperty"]));
|
||||
|
||||
// And the incremental path must also accept them without throwing.
|
||||
// Mutate the file so the incremental layer doesn't skip it as
|
||||
// unchanged (its hash already matches after the full rebuild above).
|
||||
fs.appendFileSync(filePath, "\nexport const PROTO_KEYS_REV = 2;\n", "utf-8");
|
||||
const graph = await rebuildGraph(fixture.root, { skipSymbolGraph: true });
|
||||
const result = await updateChangedFilesSymbolGraph(
|
||||
projectId,
|
||||
fixture.root,
|
||||
graph,
|
||||
[rel],
|
||||
[],
|
||||
);
|
||||
expect(result.fullRebuildRequired).toBe(false);
|
||||
expect(result.filesChanged).toBe(1);
|
||||
} finally {
|
||||
try { fs.unlinkSync(filePath); } catch { /* ignore */ }
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
|
||||
@@ -0,0 +1,190 @@
|
||||
// SPDX-License-Identifier: AGPL-3.0-only
|
||||
// Copyright (C) 2026 Giancarlo Erra - Altaire Limited
|
||||
|
||||
/**
|
||||
* End-to-end scale test for the symbol graph.
|
||||
*
|
||||
* Generates a synthetic Python project with N files × M symbols/file and
|
||||
* measures full-stack build & query times against a real Qdrant instance.
|
||||
* Default: 1000 files × 20 symbols/file = 20,000 symbols.
|
||||
*
|
||||
* Larger targets (10k files / 200k symbols) are gated behind
|
||||
* `SCALE_LARGE=1` and disabled in CI by default — they take 1–2 min and
|
||||
* are intended for manual perf investigations only.
|
||||
*
|
||||
* What's verified end-to-end:
|
||||
* 1. Full rebuild (extract + resolve + persist sharded payloads) finishes
|
||||
* under a generous wall-clock budget.
|
||||
* 2. Symbol-graph meta is persisted with the expected counts.
|
||||
* 3. Cold queries (codebase_impact / codebase_symbol via the cache) return
|
||||
* results within a small budget after a fresh process state.
|
||||
* 4. Phase F incremental update for a single changed file is at least
|
||||
* 4× faster than a full rebuild (the whole point of Phase F).
|
||||
*
|
||||
* Skipped automatically when Docker is unavailable.
|
||||
*/
|
||||
|
||||
import fs from "node:fs";
|
||||
import os from "node:os";
|
||||
import path from "node:path";
|
||||
import { afterAll, beforeAll, describe, expect, it } from "vitest";
|
||||
import { projectIdFromPath } from "../../src/config.js";
|
||||
import {
|
||||
invalidateGraphCache,
|
||||
rebuildGraph,
|
||||
} from "../../src/services/code-graph.js";
|
||||
import {
|
||||
getImpactRadius,
|
||||
listSymbols,
|
||||
} from "../../src/services/graph-impact.js";
|
||||
import {
|
||||
dropSymbolGraphCache,
|
||||
getSymbolGraphCache,
|
||||
} from "../../src/services/symbol-graph-cache.js";
|
||||
import { updateChangedFilesSymbolGraph } from "../../src/services/symbol-graph-incremental.js";
|
||||
import {
|
||||
deleteSymbolGraphData,
|
||||
loadSymbolGraphMeta,
|
||||
} from "../../src/services/symbol-graph-store.js";
|
||||
import { isDockerAvailable } from "../helpers/fixtures.js";
|
||||
import { waitForQdrant } from "../helpers/setup.js";
|
||||
|
||||
const dockerAvailable = isDockerAvailable();
|
||||
const LARGE = process.env.SCALE_LARGE === "1";
|
||||
|
||||
const N_FILES = LARGE ? 10_000 : 1_000;
|
||||
const SYMBOLS_PER_FILE = LARGE ? 20 : 20;
|
||||
const TOTAL_SYMBOLS = N_FILES * SYMBOLS_PER_FILE;
|
||||
|
||||
// Wall-clock budgets (intentionally loose — we only fail on order-of-magnitude regressions).
|
||||
const FULL_REBUILD_BUDGET_MS = LARGE ? 10 * 60_000 : 90_000; // 90s for 1k files locally; 10min for 10k
|
||||
const COLD_QUERY_BUDGET_MS = 5_000;
|
||||
const INCREMENTAL_SPEEDUP_MIN = 4; // Phase F must beat full rebuild by ≥4×
|
||||
|
||||
/**
|
||||
* Generate a Python project with N files. Each file imports the *previous*
|
||||
* file (creates a long dependency chain) and defines `SYMBOLS_PER_FILE`
|
||||
* functions, the last of which calls a function from the imported module.
|
||||
* This produces real cross-file edges so the symbol graph is non-trivial.
|
||||
*/
|
||||
function generateSyntheticProject(root: string): void {
|
||||
fs.mkdirSync(path.join(root, "pkg"), { recursive: true });
|
||||
// Empty __init__ so the package is importable.
|
||||
fs.writeFileSync(path.join(root, "pkg", "__init__.py"), "", "utf-8");
|
||||
|
||||
for (let i = 0; i < N_FILES; i++) {
|
||||
const fns: string[] = [];
|
||||
if (i > 0) {
|
||||
fns.push(`from pkg.mod_${i - 1} import fn_${i - 1}_0`);
|
||||
}
|
||||
for (let j = 0; j < SYMBOLS_PER_FILE; j++) {
|
||||
const callsPrev =
|
||||
j === SYMBOLS_PER_FILE - 1 && i > 0 ? ` fn_${i - 1}_0()\n` : "";
|
||||
fns.push(`def fn_${i}_${j}():\n${callsPrev} return ${i * 100 + j}`);
|
||||
}
|
||||
fs.writeFileSync(
|
||||
path.join(root, "pkg", `mod_${i}.py`),
|
||||
`${fns.join("\n\n")}\n`,
|
||||
"utf-8",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
describe.skipIf(!dockerAvailable)(
|
||||
"symbol-graph end-to-end scale",
|
||||
{ timeout: LARGE ? 30 * 60_000 : 5 * 60_000 },
|
||||
() => {
|
||||
let projectRoot: string;
|
||||
let projectId: string;
|
||||
let fullRebuildMs: number;
|
||||
|
||||
beforeAll(async () => {
|
||||
await waitForQdrant();
|
||||
projectRoot = fs.mkdtempSync(path.join(os.tmpdir(), "socraticode-scale-"));
|
||||
projectId = projectIdFromPath(projectRoot);
|
||||
// Clean any prior state for this project id (defensive — tmpdir
|
||||
// randomises but Qdrant collections may linger from prior crashes).
|
||||
try { await deleteSymbolGraphData(projectId); } catch { /* ignore */ }
|
||||
generateSyntheticProject(projectRoot);
|
||||
}, 60_000);
|
||||
|
||||
afterAll(async () => {
|
||||
try { await deleteSymbolGraphData(projectId); } catch { /* ignore */ }
|
||||
invalidateGraphCache(projectRoot);
|
||||
dropSymbolGraphCache(projectId);
|
||||
try { fs.rmSync(projectRoot, { recursive: true, force: true }); } catch { /* ignore */ }
|
||||
});
|
||||
|
||||
it(`builds the symbol graph for ${N_FILES} files / ${TOTAL_SYMBOLS} symbols within budget`, async () => {
|
||||
const start = Date.now();
|
||||
await rebuildGraph(projectRoot);
|
||||
fullRebuildMs = Date.now() - start;
|
||||
// Log so the number shows up in test output even on success.
|
||||
console.log(`[scale] Full rebuild: ${N_FILES} files / ${TOTAL_SYMBOLS} symbols in ${fullRebuildMs} ms`);
|
||||
expect(fullRebuildMs).toBeLessThan(FULL_REBUILD_BUDGET_MS);
|
||||
|
||||
const meta = await loadSymbolGraphMeta(projectId);
|
||||
expect(meta).not.toBeNull();
|
||||
// Each file contributes SYMBOLS_PER_FILE non-<module> symbols.
|
||||
// Plus the package __init__.py (1 file, 0 symbols).
|
||||
expect(meta?.symbolCount ?? 0).toBeGreaterThanOrEqual(TOTAL_SYMBOLS - 5);
|
||||
});
|
||||
|
||||
it("answers cold codebase_symbols / codebase_impact queries within budget", async () => {
|
||||
// Drop the in-memory cache so we exercise the persisted-shard path.
|
||||
dropSymbolGraphCache(projectId);
|
||||
const cache = await getSymbolGraphCache(projectId);
|
||||
expect(cache).not.toBeNull();
|
||||
if (!cache) return;
|
||||
|
||||
const t1 = Date.now();
|
||||
const symbols = await listSymbols(cache, { query: "fn_500_5", limit: 10 });
|
||||
const listMs = Date.now() - t1;
|
||||
console.log(`[scale] Cold listSymbols("fn_500_5"): ${symbols.length} hits in ${listMs} ms`);
|
||||
expect(listMs).toBeLessThan(COLD_QUERY_BUDGET_MS);
|
||||
expect(symbols.length).toBeGreaterThan(0);
|
||||
|
||||
const t2 = Date.now();
|
||||
const impact = await getImpactRadius(cache, "fn_500_0", 3);
|
||||
const impactMs = Date.now() - t2;
|
||||
console.log(`[scale] Cold getImpactRadius("fn_500_0", depth=3): ${impact.totalFiles} files in ${impactMs} ms`);
|
||||
expect(impactMs).toBeLessThan(COLD_QUERY_BUDGET_MS);
|
||||
});
|
||||
|
||||
it("Phase F incremental update is significantly faster than a full rebuild", async () => {
|
||||
// Touch one file: append a new symbol.
|
||||
const rel = "pkg/mod_42.py";
|
||||
const abs = path.join(projectRoot, rel);
|
||||
const original = fs.readFileSync(abs, "utf-8");
|
||||
fs.writeFileSync(
|
||||
abs,
|
||||
`${original}\ndef fn_42_${SYMBOLS_PER_FILE}_inc():\n return -1\n`,
|
||||
"utf-8",
|
||||
);
|
||||
try {
|
||||
// Build a fresh file-import graph (cheap part of Phase F) and time
|
||||
// only the per-file symbol patch.
|
||||
const graph = await rebuildGraph(projectRoot, { skipSymbolGraph: true });
|
||||
|
||||
const start = Date.now();
|
||||
const result = await updateChangedFilesSymbolGraph(
|
||||
projectId,
|
||||
projectRoot,
|
||||
graph,
|
||||
[rel],
|
||||
[],
|
||||
);
|
||||
const incrementalMs = Date.now() - start;
|
||||
console.log(`[scale] Phase F incremental update (1 file in ${N_FILES}-file repo): ${incrementalMs} ms (full rebuild was ${fullRebuildMs} ms)`);
|
||||
expect(result.fullRebuildRequired).toBe(false);
|
||||
expect(result.filesChanged).toBeGreaterThanOrEqual(1);
|
||||
// Phase F's whole reason to exist: must beat full rebuild on a small
|
||||
// change set by a wide margin. We allow 4× as a deliberately loose
|
||||
// threshold to avoid CI flakiness.
|
||||
expect(incrementalMs * INCREMENTAL_SPEEDUP_MIN).toBeLessThan(fullRebuildMs);
|
||||
} finally {
|
||||
fs.writeFileSync(abs, original, "utf-8");
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
@@ -1,19 +1,26 @@
|
||||
// SPDX-License-Identifier: AGPL-3.0-only
|
||||
// Copyright (C) 2026 Giancarlo Erra - Altaire Limited
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
|
||||
import { logger, setMcpLogSender } from "../../src/services/logger.js";
|
||||
import { getLogLevel, logger, setLogLevel, setMcpLogSender } from "../../src/services/logger.js";
|
||||
|
||||
type SenderFn = Parameters<typeof setMcpLogSender>[0];
|
||||
|
||||
describe("logger", () => {
|
||||
let stderrSpy: ReturnType<typeof vi.spyOn>;
|
||||
let savedLevel: ReturnType<typeof getLogLevel>;
|
||||
|
||||
beforeEach(() => {
|
||||
// Pin the level so tests don't depend on SOCRATICODE_LOG_LEVEL in the
|
||||
// host shell — that env was the suspected cause of the reviewer's
|
||||
// "does not emit debug at the default info level" flake.
|
||||
savedLevel = getLogLevel();
|
||||
setLogLevel("info");
|
||||
stderrSpy = vi.spyOn(process.stderr, "write").mockImplementation(() => true);
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
stderrSpy.mockRestore();
|
||||
setLogLevel(savedLevel);
|
||||
});
|
||||
|
||||
describe("log methods exist", () => {
|
||||
|
||||
Reference in New Issue
Block a user