feat(embeddings): add LiteLLM as a first-class embedding provider

LiteLLM Proxy Server (https://docs.litellm.ai/docs/simple_proxy) exposes an OpenAI-compatible /v1/embeddings endpoint and fans out to 100+ underlying providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI, Ollama, ...). Mirroring the lmstudio strategy (PR #42 + commit bb141a0) but with three meaningful differences that justify a dedicated provider rather than a flag on provider-openai: - Authentication is mandatory. LiteLLM gates /v1/models with the master key or a virtual key, unlike LM Studio (no auth by default) and OpenAI (cloud key). LITELLM_API_KEY is checked at config-load time; the provider also duck-types 401/403 in ensureReady/healthCheck via err.status to surface a distinct "auth rejected" message vs. "proxy unreachable". - Model aliases come from the proxy's config.yaml, so EMBEDDING_MODEL and EMBEDDING_DIMENSIONS have no sensible defaults. Fail-fast in loadEmbeddingConfig with provider-specific error messages pointing at litellm_params.model in the proxy config and at the underlying alias's output dim. - Whether dimensions can be forwarded depends on the underlying provider: Matryoshka-aware models (text-embedding-3-*, voyage-3) accept it, non-Matryoshka backends (BGE, nomic, Cohere v3) reject. Made opt-in via LITELLM_SEND_DIMENSIONS=true rather than hardcoded like provider-openai does for text-embedding-3-*, since LiteLLM aliases are user-defined. Encoding-format=float fix from bb141a0 ports verbatim — the OpenAI SDK 6.x base64-decode path corrupts any backend that returns plain JSON float arrays (many LiteLLM aliases do, including Ollama-routed and tei-wrapped ones). Files: - src/services/provider-litellm.ts: new LiteLLMEmbeddingProvider with the same OpenAI-SDK + custom baseURL pattern. Default baseURL http://localhost:4000/v1 (LiteLLM's default port, /v1 prefix required). Batch size 256 — between OpenAI's 512 and LM Studio's 64, since the practical ceiling depends on whichever provider the alias resolves to. ensureReady distinguishes proxy-unreachable / auth-rejected / alias-not-registered. Lists up to 10 currently-registered models in the alias-missing error so the operator can sanity-check their config.yaml without leaving the log. - src/services/embedding-config.ts: extends EmbeddingProvider union with "litellm", adds litellmUrl to EmbeddingConfig, fail-fast validation for LITELLM_API_KEY + EMBEDDING_MODEL + EMBEDDING_DIMENSIONS (key first so a virtual-key user fixes the easy problem before touching the proxy config), updates Invalid EMBEDDING_PROVIDER message and hasApiKey log expression. - src/services/embedding-provider.ts: factory case for litellm with dynamic import to avoid loading the OpenAI SDK at startup for non-litellm users. - README.md: dedicated LiteLLM section, MCP host config example, env-var table entries for EMBEDDING_PROVIDER / EMBEDDING_MODEL / EMBEDDING_DIMENSIONS / EMBEDDING_CONTEXT_LENGTH (clarifying which require manual values for litellm), new LiteLLM Configuration table. - tests/unit/embedding-config.test.ts: 9 new cases (model + dim + key required, error-ordering, URL default + override, dimensions parsing, EMBEDDING_CONTEXT_LENGTH override for unknown aliases, auto-detection when alias matches a known model name) plus updated "full external config" expected object and updated invalid-provider error message. - tests/unit/embedding-provider.test.ts: factory test for litellm, plus 4 cases against a deliberately-closed port (config rejects construction without API_KEY, ensureReady unreachable error format, healthCheck short-circuits on missing key without a network call, healthCheck reaches "Not reachable" path without throwing). Backward compatible. The litellm provider is opt-in via EMBEDDING_PROVIDER=litellm. Existing ollama, openai, google, and lmstudio paths are untouched. Verified: 64/64 unit tests pass on the touched suites; biome lint clean; tsc --noEmit clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-07-03 14:05:21 +02:00 · 2026-05-08 08:58:51 +02:00
parent a0814005f8
commit 1708510c16
6 changed files with 638 additions and 14 deletions
@@ -242,7 +242,7 @@ On VS Code's 2.45M‑line codebase, SocratiCode answers architectural questions
 - **Hybrid code search** — Built on Qdrant, a purpose-built vector database with HNSW indexing, concurrent read/write, and payload filtering. Each chunk stores both a dense vector and a BM25 sparse vector; the Query API runs both sub-queries in a single round-trip and fuses results with Reciprocal Rank Fusion (RRF). Semantic search handles conceptual queries like "authentication middleware" even when those exact words don't appear in the code. BM25 handles exact identifier and keyword lookups. You get the best of both in every query with no tuning required.
 - **Configurable Qdrant** — Use the built-in Docker Qdrant (default, zero config) or connect to your own instance (self-hosted, remote server, or Qdrant Cloud). Configure via `QDRANT_MODE`, `QDRANT_URL`, and `QDRANT_API_KEY` environment variables.
 - **Configurable Ollama** — Use the built-in Docker Ollama (default, zero config) or point to your own Ollama instance (native install -GPU access-, remote server, etc.). Configure via `OLLAMA_MODE`, `OLLAMA_URL`, `EMBEDDING_MODEL` and `EMBEDDING_DIMENSIONS` environment variables.
- **Multi-provider embeddings** — Switch between Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), Google Gemini (`gemini-embedding-001`, free tier), or LM Studio (local OpenAI-compatible server) with a single environment variable. No provider-specific configuration files.
+- **Multi-provider embeddings** — Switch between Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), Google Gemini (`gemini-embedding-001`, free tier), LM Studio (local OpenAI-compatible server), or LiteLLM (proxy gateway in front of 100+ providers) with a single environment variable. No provider-specific configuration files.
 - **Private & secure** — Everything runs on your machine — your code never leaves your network. The default Docker setup includes Ollama (embeddings) and Qdrant (vector storage) with no external API calls. No API costs, no token limits. Suitable for air-gapped and on-premises environments. Optional cloud providers (OpenAI, Google Gemini, Qdrant Cloud) are available but never required.
 - **AST-aware chunking** — Files are split at function/class boundaries using AST parsing (ast-grep), not arbitrary line counts. This produces higher-quality search results. Falls back to line-based chunking for unsupported languages.
 - **Polyglot code dependency graph** — Static analysis of import/require/use/include statements using ast-grep for 18+ languages. No external tools like dependency-cruiser required. Detects circular dependencies and generates visual Mermaid diagrams.
@@ -717,6 +717,42 @@ or when you want a Mac/Windows-friendly desktop UI for managing GGUF models).
 > Optional: `LMSTUDIO_URL` (default `http://localhost:1234/v1`) for non-default ports;
 > `LMSTUDIO_API_KEY` if you've enabled API key auth in LM Studio.

+#### LiteLLM (proxy gateway, 100+ providers)
+
+[LiteLLM](https://docs.litellm.ai/docs/simple_proxy) Proxy Server exposes an OpenAI-compatible
+`/v1/embeddings` endpoint and fans out to any of 100+ underlying providers (OpenAI, Anthropic,
+Cohere, Voyage, HuggingFace, Bedrock, Vertex AI, Ollama, ...). Use this provider when you want
+**centralised key management** (one virtual key per developer instead of N provider keys spread
+across MCP configs), **fallback / load balancing** between embedding backends, or
+**provider-agnostic indexes** that survive a backend swap.
+
+```json
+{
+  "mcpServers": {
+    "socraticode": {
+      "command": "node",
+      "args": ["/absolute/path/to/socraticode/dist/index.js"],
+      "env": {
+        "EMBEDDING_PROVIDER": "litellm",
+        "LITELLM_API_KEY": "sk-...",
+        "EMBEDDING_MODEL": "text-embedding-3-small",
+        "EMBEDDING_DIMENSIONS": "1536"
+      }
+    }
+  }
+}
+```
+
+> **`LITELLM_API_KEY`, `EMBEDDING_MODEL`, and `EMBEDDING_DIMENSIONS` are all required.**
+> LiteLLM proxies always authenticate (master key or virtual key from `/key/generate`); the
+> alias name and underlying dimension come from your `config.yaml`. SocratiCode fails fast on
+> any missing piece.
+>
+> Optional: `LITELLM_URL` (default `http://localhost:4000/v1`) — must include the `/v1`
+> suffix; `LITELLM_SEND_DIMENSIONS=true` to forward the OpenAI `dimensions` parameter
+> through the proxy (only safe for Matryoshka-aware backends like `text-embedding-3-*` or
+> `voyage-3` — non-Matryoshka backends reject the request).
+
 ### Git Worktrees (shared index across directories)

 If you use [git worktrees](https://git-scm.com/docs/git-worktree) — or any workflow where the same repository lives in multiple directories — each path would normally get its own Qdrant index. This means redundant embedding and storage for what is essentially the same codebase.
@@ -1118,10 +1154,10 @@ The rest of this section documents the variables themselves. Pass them using whi

 | Variable | Default | Description |
 |----------|---------|-------------|
-| `EMBEDDING_PROVIDER` | `ollama` | Embedding backend: `ollama` (local, default), `openai`, `google`, or `lmstudio` |
-| `EMBEDDING_MODEL` | *(per provider)* | Model name. Defaults: `nomic-embed-text` (ollama), `text-embedding-3-small` (openai), `gemini-embedding-001` (google). **Required** for `lmstudio` (no default). |
-| `EMBEDDING_DIMENSIONS` | *(per provider)* | Vector dimensions. Defaults: `768` (ollama), `1536` (openai), `3072` (google). **Required** for `lmstudio` (no default; varies per loaded model). |
-| `EMBEDDING_CONTEXT_LENGTH` | *(auto-detected)* | Model context window in tokens. Auto-detected for known models. Set manually for custom or LM Studio models. |
+| `EMBEDDING_PROVIDER` | `ollama` | Embedding backend: `ollama` (local, default), `openai`, `google`, `lmstudio`, or `litellm` |
+| `EMBEDDING_MODEL` | *(per provider)* | Model name. Defaults: `nomic-embed-text` (ollama), `text-embedding-3-small` (openai), `gemini-embedding-001` (google). **Required** for `lmstudio` and `litellm` (no default). |
+| `EMBEDDING_DIMENSIONS` | *(per provider)* | Vector dimensions. Defaults: `768` (ollama), `1536` (openai), `3072` (google). **Required** for `lmstudio` and `litellm` (no default; varies per loaded model / proxy alias). |
+| `EMBEDDING_CONTEXT_LENGTH` | *(auto-detected)* | Model context window in tokens. Auto-detected for known model names (works for LiteLLM aliases that match the underlying model name). Set manually for custom LM Studio models or arbitrary LiteLLM aliases. |

 ### Ollama Configuration (when `EMBEDDING_PROVIDER=ollama`)

@@ -1147,6 +1183,14 @@ The rest of this section documents the variables themselves. Pass them using whi
 | `LMSTUDIO_URL` | `http://localhost:1234/v1` | Full base URL of LM Studio's OpenAI-compatible Local Server. Override when the server runs on a non-default port or a remote machine (e.g. `http://gpu-rig.local:5678/v1`). Must include the `/v1` suffix. |
 | `LMSTUDIO_API_KEY` | *(none)* | Optional. LM Studio's Local Server has no auth by default; set this only if you've enabled API key auth in the LM Studio UI. |

+### LiteLLM Configuration (when `EMBEDDING_PROVIDER=litellm`)
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LITELLM_URL` | `http://localhost:4000/v1` | Full base URL of the LiteLLM proxy's OpenAI-compatible endpoint. Override for non-default ports or remote proxies (e.g. `https://litellm.internal:4001/v1`). Must include the `/v1` suffix — LiteLLM exposes `/v1/embeddings` under that prefix. |
+| `LITELLM_API_KEY` | *(none)* | **Required.** Master key (`general_settings.master_key` in the proxy's `config.yaml`) or a virtual key issued via LiteLLM's `/key/generate` endpoint. Unlike LM Studio, LiteLLM always authenticates — `/v1/models` itself is gated. |
+| `LITELLM_SEND_DIMENSIONS` | `false` | Opt-in (`true` / `1` / `yes`). Forwards the OpenAI-style `dimensions` parameter through the proxy. Safe only for Matryoshka-aware backends (`text-embedding-3-*`, `voyage-3`); other backends (BGE, `nomic-embed-text`, Cohere v3) reject the request. Leave unset unless you know your alias resolves to a Matryoshka model. |
+
 ### Qdrant Configuration

 | Variable | Default | Description |
@@ -9,6 +9,10 @@
 *   - "google": Use Google Generative AI Embedding API. Requires GOOGLE_API_KEY.
 *   - "lmstudio": Use a local LM Studio server (OpenAI-compatible). Requires
 *                 EMBEDDING_MODEL and EMBEDDING_DIMENSIONS to be set explicitly.
+ *   - "litellm": Use a LiteLLM proxy server (OpenAI-compatible gateway in front of
+ *                100+ underlying providers). Requires LITELLM_API_KEY,
+ *                EMBEDDING_MODEL (must match an alias in the proxy's config.yaml),
+ *                and EMBEDDING_DIMENSIONS (the alias's underlying dim).
 *
 * Ollama-specific:
 *   OLLAMA_MODE:
@@ -34,6 +38,18 @@
 *   LMSTUDIO_API_KEY:      Optional API key. LM Studio's Local Server has no auth by default;
 *                          set this only if you've enabled an API key in LM Studio.
 *
+ * LiteLLM-specific:
+ *   LITELLM_URL:               OpenAI-compatible base URL of the LiteLLM proxy.
+ *                              Default: http://localhost:4000/v1 (the /v1 suffix is required;
+ *                              LiteLLM exposes /v1/embeddings under that prefix).
+ *   LITELLM_API_KEY:           Required. Master key (general_settings.master_key) or a virtual
+ *                              key issued via /key/generate. Unlike LM Studio, the proxy always
+ *                              authenticates.
+ *   LITELLM_SEND_DIMENSIONS:   Opt-in ("true" / "1" / "yes"). Forwards the OpenAI-style
+ *                              `dimensions` parameter to the proxy for Matryoshka-aware
+ *                              underlying models (text-embedding-3-*, voyage-3). Default off
+ *                              because non-Matryoshka backends reject it.
+ *
 * Shared:
 *   EMBEDDING_MODEL:       Model name (default depends on provider; required for lmstudio).
 *   EMBEDDING_DIMENSIONS:  Vector dimensions — must match the model (default depends on
@@ -45,7 +61,7 @@ import { logger } from "./logger.js";

 // ── Types ─────────────────────────────────────────────────────────────────

-export type EmbeddingProvider = "ollama" | "openai" | "google" | "lmstudio";
+export type EmbeddingProvider = "ollama" | "openai" | "google" | "lmstudio" | "litellm";
 export type OllamaMode = "docker" | "external" | "auto";

 export interface EmbeddingConfig {
@@ -57,6 +73,8 @@ export interface EmbeddingConfig {
  ollamaUrl: string;
  /** LM Studio OpenAI-compatible base URL (only relevant when embeddingProvider is "lmstudio"). */
  lmstudioUrl: string;
+  /** LiteLLM proxy OpenAI-compatible base URL (only relevant when embeddingProvider is "litellm"). */
+  litellmUrl: string;
  embeddingModel: string;
  embeddingDimensions: number;
  /** Max context window in tokens. Used for client-side pre-truncation. */
@@ -67,15 +85,17 @@ export interface EmbeddingConfig {
 // ── Provider defaults ─────────────────────────────────────────────────────

 /**
- * lmstudio has empty defaults: LM Studio has no out-of-the-box model — users must load
- * one in the UI and choose dimensions to match. We fail-fast in loadEmbeddingConfig()
- * when the user picks lmstudio without setting EMBEDDING_MODEL / EMBEDDING_DIMENSIONS.
+ * lmstudio and litellm have empty defaults: there's no canonical model — users
+ * pick one (the loaded LM Studio model, or a proxy alias from LiteLLM's
+ * config.yaml). We fail-fast in loadEmbeddingConfig() when those providers are
+ * selected without explicit EMBEDDING_MODEL / EMBEDDING_DIMENSIONS.
 */
 const PROVIDER_DEFAULTS: Record<EmbeddingProvider, { model: string; dimensions: number }> = {
  ollama:   { model: "nomic-embed-text",        dimensions: 768  },
  openai:   { model: "text-embedding-3-small",  dimensions: 1536 },
  google:   { model: "gemini-embedding-001",    dimensions: 3072 },
  lmstudio: { model: "",                        dimensions: 0    },
+  litellm:  { model: "",                        dimensions: 0    },
 };

 // ── Ollama mode defaults ──────────────────────────────────────────────────
@@ -130,10 +150,11 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
    rawProvider !== "ollama" &&
    rawProvider !== "openai" &&
    rawProvider !== "google" &&
-    rawProvider !== "lmstudio"
+    rawProvider !== "lmstudio" &&
+    rawProvider !== "litellm"
  ) {
    throw new Error(
-      `Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "ollama", "openai", "google", or "lmstudio".`,
+      `Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "ollama", "openai", "google", "lmstudio", or "litellm".`,
    );
  }
  const embeddingProvider: EmbeddingProvider = rawProvider;
@@ -159,6 +180,38 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
    }
  }

+  // LiteLLM proxy aliases are user-defined in config.yaml — there is no canonical
+  // default model name and the underlying dimension depends on which provider the
+  // alias resolves to. Authentication is also mandatory (the proxy enforces it
+  // even for read-only /v1/models). Fail fast on each missing piece so the
+  // operator gets a single, specific error rather than a generic 401 / 404 from
+  // the proxy at first embed().
+  if (embeddingProvider === "litellm") {
+    if (!process.env.LITELLM_API_KEY) {
+      throw new Error(
+        "LITELLM_API_KEY is required when EMBEDDING_PROVIDER=litellm. " +
+        "Set it to the proxy's master key (general_settings.master_key in config.yaml) " +
+        "or to a virtual key issued via LiteLLM's /key/generate endpoint.",
+      );
+    }
+    if (!process.env.EMBEDDING_MODEL) {
+      throw new Error(
+        "EMBEDDING_MODEL is required when EMBEDDING_PROVIDER=litellm. " +
+        "Set it to a model_name from your LiteLLM config.yaml (e.g. EMBEDDING_MODEL=text-embedding-3-small " +
+        "if your proxy aliases that name; LiteLLM rewrites the call to whichever litellm_params.model " +
+        "is configured under that alias).",
+      );
+    }
+    if (!process.env.EMBEDDING_DIMENSIONS) {
+      throw new Error(
+        "EMBEDDING_DIMENSIONS is required when EMBEDDING_PROVIDER=litellm. " +
+        "The proxy alias maps to an underlying model whose dimension we cannot infer — set this to the " +
+        "underlying model's output dim (e.g. 1536 for text-embedding-3-small, 1024 for voyage-2, " +
+        "768 for nomic-embed-text-v1.5).",
+      );
+    }
+  }
+
  // ── Ollama mode (only relevant for ollama provider) ─────────────────
  const rawMode = process.env.OLLAMA_MODE || "auto";
  if (rawMode !== "docker" && rawMode !== "external" && rawMode !== "auto") {
@@ -188,6 +241,7 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
    ollamaMode,
    ollamaUrl: process.env.OLLAMA_URL || modeDefaults.url,
    lmstudioUrl: process.env.LMSTUDIO_URL || "http://localhost:1234/v1",
+    litellmUrl: process.env.LITELLM_URL || "http://localhost:4000/v1",
    embeddingModel,
    embeddingDimensions,
    embeddingContextLength: contextLengthEnv
@@ -213,6 +267,10 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
    ...(embeddingProvider === "lmstudio" ? {
      lmstudioUrl: _config.lmstudioUrl,
    } : {}),
+    ...(embeddingProvider === "litellm" ? {
+      litellmUrl: _config.litellmUrl,
+      sendDimensions: !!process.env.LITELLM_SEND_DIMENSIONS,
+    } : {}),
    embeddingModel: _config.embeddingModel,
    embeddingDimensions: _config.embeddingDimensions,
    embeddingContextLength: _config.embeddingContextLength || "auto",
@@ -224,7 +282,9 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
          ? process.env.GOOGLE_API_KEY
          : embeddingProvider === "lmstudio"
            ? process.env.LMSTUDIO_API_KEY
-            : undefined),
+            : embeddingProvider === "litellm"
+              ? process.env.LITELLM_API_KEY
+              : undefined),
  });

  return _config;
@@ -12,6 +12,7 @@
 *   - openai   — OpenAI Embeddings API (text-embedding-3-small, etc.)
 *   - google   — Google Generative AI Embedding API (gemini-embedding-001, etc.)
 *   - lmstudio — local LM Studio server via OpenAI-compatible API
+ *   - litellm  — LiteLLM proxy (OpenAI-compatible gateway in front of 100+ providers)
 */

 import type { InfraProgressCallback } from "./docker.js";
@@ -66,9 +67,14 @@ export async function getEmbeddingProvider(onProgress?: InfraProgressCallback):
      _provider = new LMStudioEmbeddingProvider();
      break;
    }
+    case "litellm": {
+      const { LiteLLMEmbeddingProvider } = await import("./provider-litellm.js");
+      _provider = new LiteLLMEmbeddingProvider();
+      break;
+    }
    default:
      throw new Error(
-        `Unknown embedding provider: "${name}". Must be "ollama", "openai", "google", or "lmstudio".`,
+        `Unknown embedding provider: "${name}". Must be "ollama", "openai", "google", "lmstudio", or "litellm".`,
      );
  }

@@ -0,0 +1,303 @@
+// SPDX-License-Identifier: AGPL-3.0-only
+// Copyright (C) 2026 Giancarlo Erra - Altaire Limited
+/**
+ * LiteLLM embedding provider.
+ *
+ * LiteLLM Proxy Server (https://docs.litellm.ai/docs/simple_proxy) exposes an
+ * OpenAI-compatible /v1/embeddings endpoint that fans out to 100+ underlying
+ * model providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock,
+ * Vertex AI, Ollama, ...). Model aliases are defined in the proxy's config.yaml,
+ * so from this client's perspective LiteLLM looks like an OpenAI server speaking
+ * arbitrary model names.
+ *
+ * This provider is intentionally separate from `provider-openai.ts` and
+ * `provider-lmstudio.ts` because:
+ *   - LiteLLM ALWAYS requires an API key (virtual or master), unlike LM Studio.
+ *   - Default port is 4000 and /v1/models lists proxy-registered aliases, not
+ *     loaded local models.
+ *   - Whether the `dimensions` parameter is honoured depends on the underlying
+ *     provider chosen by the proxy alias, so we make it opt-in via
+ *     LITELLM_SEND_DIMENSIONS instead of hardcoding it like provider-openai
+ *     does for text-embedding-3-*.
+ *   - Health check messaging points at proxy-side issues (master key, alias
+ *     missing in config.yaml) rather than at SaaS quotas or local model loads.
+ *
+ * Required env when using this provider:
+ *   EMBEDDING_PROVIDER=litellm
+ *   LITELLM_API_KEY=<virtual-or-master-key>
+ *   EMBEDDING_MODEL=<alias-from-litellm-config-yaml>
+ *   EMBEDDING_DIMENSIONS=<dim-of-underlying-model>
+ *
+ * Optional env:
+ *   LITELLM_URL=http://localhost:4000/v1   (default; must include /v1 suffix)
+ *   LITELLM_SEND_DIMENSIONS=true           (opt-in; forwards `dimensions` to the
+ *                                           proxy for Matryoshka-aware models like
+ *                                           text-embedding-3-* or voyage-3 routed
+ *                                           via LiteLLM. Default false because
+ *                                           non-Matryoshka backends raise on it.)
+ *   EMBEDDING_CONTEXT_LENGTH=<tokens>      (defaults to 2048 if model unknown)
+ */
+
+import OpenAI from "openai";
+import { getEmbeddingConfig } from "./embedding-config.js";
+import type { EmbeddingHealthStatus, EmbeddingProvider, EmbeddingReadinessResult } from "./embedding-types.js";
+import { logger } from "./logger.js";
+
+// ── Constants ───────────────────────────────────────────────────────────
+
+/**
+ * Conservative batch size — LiteLLM is a proxy in front of an arbitrary backend,
+ * so the practical batch ceiling depends on whichever provider the alias resolves
+ * to (an OpenAI alias tolerates 512+, a self-hosted Ollama alias may OOM at 64+).
+ * 256 sits between the OpenAI (512) and LM Studio (64) defaults and rarely
+ * triggers proxy-level rate limiting on commercial backends.
+ */
+const LITELLM_BATCH_SIZE = 256;
+
+/**
+ * Conservative chars-per-token ratio for code. Same value as provider-openai
+ * and provider-lmstudio; LiteLLM does not retokenize on the proxy hop.
+ */
+const CHARS_PER_TOKEN_ESTIMATE = 3.0;
+
+/**
+ * Fallback context length when EMBEDDING_CONTEXT_LENGTH is unset and the model
+ * alias is not in the known-models table. 2048 is a safe lower bound across the
+ * common embedding backends LiteLLM proxies (Voyage 16k, OpenAI 8191, Cohere 512,
+ * BGE 512). Underestimating only triggers extra client-side truncation; never
+ * a request-rejection.
+ */
+const DEFAULT_CONTEXT_LENGTH = 2048;
+
+// ── Client management ───────────────────────────────────────────────────
+
+let litellmClient: OpenAI | null = null;
+let litellmBaseUrl: string | null = null;
+let litellmApiKey: string | null = null;
+
+function getClient(): OpenAI {
+  const config = getEmbeddingConfig();
+  const baseUrl = config.litellmUrl;
+  // Read the key from the live env so test harnesses that mutate process.env
+  // between calls observe the change without an explicit reset.
+  const apiKey = process.env.LITELLM_API_KEY;
+  if (!apiKey) {
+    throw new Error(
+      "LITELLM_API_KEY environment variable is required when using the LiteLLM embedding provider. " +
+      "Set it in your MCP config env block. Use either the proxy's master key or a virtual key " +
+      "issued via LiteLLM's /key/generate endpoint.",
+    );
+  }
+  if (!litellmClient || litellmBaseUrl !== baseUrl || litellmApiKey !== apiKey) {
+    litellmClient = new OpenAI({
+      apiKey,
+      baseURL: baseUrl,
+    });
+    litellmBaseUrl = baseUrl;
+    litellmApiKey = apiKey;
+  }
+  return litellmClient;
+}
+
+/** Reset client (for testing or LITELLM_URL / LITELLM_API_KEY hot-swap). */
+export function resetLiteLLMClient(): void {
+  litellmClient = null;
+  litellmBaseUrl = null;
+  litellmApiKey = null;
+}
+
+// ── Pre-truncation ──────────────────────────────────────────────────────
+
+function pretruncateTexts(texts: string[], contextLength: number): string[] {
+  if (contextLength <= 0) return texts;
+  const maxChars = Math.floor(contextLength * CHARS_PER_TOKEN_ESTIMATE);
+  return texts.map((t) => (t.length > maxChars ? t.substring(0, maxChars) : t));
+}
+
+// ── Auth-error detection ────────────────────────────────────────────────
+
+/**
+ * The OpenAI SDK surfaces 401/403 from LiteLLM as APIError subclasses with a
+ * `.status` field. We don't import those classes (avoids a hard dep on the SDK's
+ * private subclass exports) and instead duck-type on `.status`.
+ */
+function isAuthError(err: unknown): boolean {
+  if (typeof err !== "object" || err === null) return false;
+  const status = (err as { status?: unknown }).status;
+  return status === 401 || status === 403;
+}
+
+// ── Provider class ──────────────────────────────────────────────────────
+
+export class LiteLLMEmbeddingProvider implements EmbeddingProvider {
+  readonly name = "litellm";
+
+  async ensureReady(): Promise<EmbeddingReadinessResult> {
+    const config = getEmbeddingConfig();
+    // Fail fast with our own message before letting the SDK construct the client.
+    const client = getClient();
+
+    // Step 1 — connectivity + auth. Three failure modes share a single round trip:
+    // proxy unreachable (DNS/connection refused), bad credentials (401/403), or
+    // proxy reachable but mis-configured (500/etc.). Distinguish them so the
+    // operator gets a directly actionable hint.
+    let modelList: Awaited<ReturnType<typeof client.models.list>>;
+    try {
+      modelList = await client.models.list();
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      if (isAuthError(err)) {
+        throw new Error(
+          `LiteLLM rejected the request at ${config.litellmUrl} with an authentication error. ` +
+          "Check that LITELLM_API_KEY matches a valid master or virtual key on the proxy " +
+          "(see LiteLLM's /key/info endpoint or the proxy's general_settings.master_key). " +
+          `Underlying error: ${message}`,
+        );
+      }
+      throw new Error(
+        `LiteLLM proxy is not reachable at ${config.litellmUrl}. ` +
+        "Make sure the proxy is running (e.g. `litellm --config config.yaml`). " +
+        "If you've changed the port or are running it remotely, set LITELLM_URL accordingly " +
+        "(e.g. http://litellm.internal:4000/v1) — the URL must include the /v1 suffix. " +
+        `Underlying error: ${message}`,
+      );
+    }
+
+    // Step 2 — alias registered. LiteLLM's /v1/models returns the model_list
+    // entries declared in config.yaml; if the configured EMBEDDING_MODEL is
+    // missing the proxy will return a NotFoundError on every embed() call,
+    // which is opaque under high concurrency. Fail early with a hint that
+    // points at the proxy config rather than at the underlying provider.
+    const modelRegistered = modelList.data.some((m) => m.id === config.embeddingModel);
+    if (!modelRegistered) {
+      const known = modelList.data.map((m) => m.id).slice(0, 10).join(", ");
+      throw new Error(
+        `LiteLLM is reachable at ${config.litellmUrl} but the embedding model ` +
+        `"${config.embeddingModel}" is not registered on the proxy. ` +
+        "Add it to your LiteLLM config.yaml under model_list (with a model_name matching " +
+        "EMBEDDING_MODEL) and restart the proxy — then retry. " +
+        (known ? `Currently registered models: ${known}.` : "The proxy currently has no registered models."),
+      );
+    }
+
+    logger.info("LiteLLM embedding provider ready", {
+      baseUrl: config.litellmUrl,
+      model: config.embeddingModel,
+      sendDimensions: shouldSendDimensions(),
+    });
+    // LiteLLM is user-managed — no containers, no model pulls.
+    return { modelPulled: false, containerStarted: false, imagePulled: false };
+  }
+
+  async embed(texts: string[]): Promise<number[][]> {
+    if (texts.length === 0) return [];
+
+    const config = getEmbeddingConfig();
+    const client = getClient();
+    const contextLength = config.embeddingContextLength > 0
+      ? config.embeddingContextLength
+      : DEFAULT_CONTEXT_LENGTH;
+    const truncated = pretruncateTexts(texts, contextLength);
+
+    if (truncated.length <= LITELLM_BATCH_SIZE) {
+      return this._embedBatch(client, truncated, config.embeddingModel, config.embeddingDimensions);
+    }
+
+    const results: number[][] = [];
+    for (let i = 0; i < truncated.length; i += LITELLM_BATCH_SIZE) {
+      const batch = truncated.slice(i, i + LITELLM_BATCH_SIZE);
+      const embeddings = await this._embedBatch(client, batch, config.embeddingModel, config.embeddingDimensions);
+      results.push(...embeddings);
+    }
+    return results;
+  }
+
+  async embedSingle(text: string): Promise<number[]> {
+    const results = await this.embed([text]);
+    if (results.length === 0) {
+      throw new Error("Embedding failed: no result returned");
+    }
+    return results[0];
+  }
+
+  async healthCheck(): Promise<EmbeddingHealthStatus> {
+    const config = getEmbeddingConfig();
+    const lines: string[] = [];
+    const icon = (ok: boolean) => (ok ? "[OK]" : "[MISSING]");
+
+    const hasKey = !!process.env.LITELLM_API_KEY;
+    lines.push(
+      `${icon(hasKey)} LiteLLM API key: ` +
+      (hasKey ? "Configured" : "Missing — set LITELLM_API_KEY in your MCP config"),
+    );
+    if (!hasKey) {
+      return { available: false, modelReady: false, statusLines: lines };
+    }
+
+    try {
+      const client = getClient();
+      const models = await client.models.list();
+      lines.push(`${icon(true)} LiteLLM: Reachable at ${config.litellmUrl}`);
+
+      const modelRegistered = models.data.some((m) => m.id === config.embeddingModel);
+      lines.push(
+        `${icon(modelRegistered)} Embedding model (${config.embeddingModel}): ` +
+        (modelRegistered
+          ? "Registered on the proxy"
+          : "Not registered — add it to your LiteLLM config.yaml under model_list and restart the proxy"),
+      );
+
+      return { available: true, modelReady: modelRegistered, statusLines: lines };
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      if (isAuthError(err)) {
+        lines.push(`${icon(false)} LiteLLM: Auth rejected at ${config.litellmUrl} (${message})`);
+      } else {
+        lines.push(`${icon(false)} LiteLLM: Not reachable at ${config.litellmUrl} (${message})`);
+      }
+      return { available: false, modelReady: false, statusLines: lines };
+    }
+  }
+
+  private async _embedBatch(
+    client: OpenAI,
+    texts: string[],
+    model: string,
+    dimensions: number,
+  ): Promise<number[][]> {
+    // Forwarding `dimensions` is opt-in (LITELLM_SEND_DIMENSIONS=true). The proxy
+    // forwards it to the underlying provider verbatim — Matryoshka-aware models
+    // (text-embedding-3-*, voyage-3) accept it; others (BGE, nomic-embed,
+    // Cohere v3) reject the request. Default off keeps the provider compatible
+    // with arbitrary aliases.
+    //
+    // `encoding_format: "float"` is REQUIRED. The OpenAI SDK (6.x+) defaults to
+    // `encoding_format: "base64"` and unconditionally decodes the response with
+    // toFloat32Array(). LiteLLM forwards the original provider's response, which
+    // for many backends (Ollama, BGE-via-tei, custom HF wrappers) is a JSON
+    // float array. The SDK's decode path then runs `Buffer.from(<array>, 'base64')`,
+    // Node.js silently drops the encoding for array inputs and clamps each float
+    // (<1.0) to uint8 0, producing a zero buffer reinterpreted as a Float32Array
+    // of zeros. This is the same bug fixed for LM Studio in commit bb141a0; the
+    // failure mode reproduces against any LiteLLM alias whose backend doesn't
+    // re-encode to base64. Setting `encoding_format: "float"` makes the SDK skip
+    // the decode step entirely.
+    const response = await client.embeddings.create({
+      model,
+      input: texts,
+      encoding_format: "float",
+      ...(shouldSendDimensions() ? { dimensions } : {}),
+    });
+    const sorted = response.data.sort((a, b) => a.index - b.index);
+    return sorted.map((d) => d.embedding);
+  }
+}
+
+// ── Helpers ─────────────────────────────────────────────────────────────
+
+function shouldSendDimensions(): boolean {
+  const raw = process.env.LITELLM_SEND_DIMENSIONS;
+  if (!raw) return false;
+  const v = raw.toLowerCase();
+  return v === "true" || v === "1" || v === "yes";
+}
@@ -24,6 +24,9 @@ describe("embedding-config", () => {
    delete process.env.GOOGLE_API_KEY;
    delete process.env.LMSTUDIO_URL;
    delete process.env.LMSTUDIO_API_KEY;
+    delete process.env.LITELLM_URL;
+    delete process.env.LITELLM_API_KEY;
+    delete process.env.LITELLM_SEND_DIMENSIONS;
  });

  afterEach(() => {
@@ -181,6 +184,7 @@ describe("embedding-config", () => {
        ollamaMode: "external",
        ollamaUrl: "http://remote-gpu:11434",
        lmstudioUrl: "http://localhost:1234/v1",
+        litellmUrl: "http://localhost:4000/v1",
        embeddingModel: "mxbai-embed-large",
        embeddingDimensions: 1024,
        embeddingContextLength: 512,
@@ -218,7 +222,7 @@ describe("embedding-config", () => {
    it("throws for invalid EMBEDDING_PROVIDER", () => {
      process.env.EMBEDDING_PROVIDER = "anthropic";
      expect(() => loadEmbeddingConfig()).toThrow(
-        'Invalid EMBEDDING_PROVIDER: "anthropic". Must be "ollama", "openai", "google", or "lmstudio".',
+        'Invalid EMBEDDING_PROVIDER: "anthropic". Must be "ollama", "openai", "google", "lmstudio", or "litellm".',
      );
    });

@@ -329,4 +333,113 @@ describe("embedding-config", () => {
      expect(config.embeddingContextLength).toBe(32768);
    });
  });
+
+  describe("litellm provider", () => {
+    it("loads when API key, model, and dimensions are all set", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+
+      const config = loadEmbeddingConfig();
+      expect(config.embeddingProvider).toBe("litellm");
+      expect(config.embeddingModel).toBe("text-embedding-3-small");
+      expect(config.embeddingDimensions).toBe(1536);
+    });
+
+    it("defaults LITELLM_URL to http://localhost:4000/v1", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+
+      const config = loadEmbeddingConfig();
+      expect(config.litellmUrl).toBe("http://localhost:4000/v1");
+    });
+
+    it("respects LITELLM_URL override", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "voyage-2";
+      process.env.EMBEDDING_DIMENSIONS = "1024";
+      process.env.LITELLM_URL = "https://litellm.internal:4001/v1";
+
+      const config = loadEmbeddingConfig();
+      expect(config.litellmUrl).toBe("https://litellm.internal:4001/v1");
+    });
+
+    it("throws when LITELLM_API_KEY is missing", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+
+      expect(() => loadEmbeddingConfig()).toThrow(
+        /LITELLM_API_KEY is required when EMBEDDING_PROVIDER=litellm/,
+      );
+    });
+
+    it("throws when EMBEDDING_MODEL is missing", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+
+      expect(() => loadEmbeddingConfig()).toThrow(
+        /EMBEDDING_MODEL is required when EMBEDDING_PROVIDER=litellm/,
+      );
+    });
+
+    it("throws when EMBEDDING_DIMENSIONS is missing", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+
+      expect(() => loadEmbeddingConfig()).toThrow(
+        /EMBEDDING_DIMENSIONS is required when EMBEDDING_PROVIDER=litellm/,
+      );
+    });
+
+    it("validates the API_KEY check before model/dimension checks", () => {
+      // All three are missing — the API key error must surface first because it's
+      // the only one a virtual-key user can fix without touching the proxy config.
+      process.env.EMBEDDING_PROVIDER = "litellm";
+
+      expect(() => loadEmbeddingConfig()).toThrow(
+        /LITELLM_API_KEY is required/,
+      );
+    });
+
+    it("includes example dimensions in the error message for discoverability", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+
+      expect(() => loadEmbeddingConfig()).toThrow(
+        /1536 for text-embedding-3-small/,
+      );
+    });
+
+    it("respects EMBEDDING_CONTEXT_LENGTH override for unknown proxy aliases", () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "team-internal-bge-large";
+      process.env.EMBEDDING_DIMENSIONS = "1024";
+      process.env.EMBEDDING_CONTEXT_LENGTH = "8192";
+
+      const config = loadEmbeddingConfig();
+      expect(config.embeddingContextLength).toBe(8192);
+    });
+
+    it("auto-detects context length when alias matches a well-known model name", () => {
+      // LiteLLM aliases are usually named after the underlying model; the context-length
+      // table lookup uses the alias verbatim, so the operator gets free auto-detection
+      // when they keep names aligned.
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+
+      const config = loadEmbeddingConfig();
+      expect(config.embeddingContextLength).toBe(8191);
+    });
+  });
 });
@@ -21,6 +21,9 @@ describe("embedding-provider", () => {
    delete process.env.GOOGLE_API_KEY;
    delete process.env.LMSTUDIO_URL;
    delete process.env.LMSTUDIO_API_KEY;
+    delete process.env.LITELLM_URL;
+    delete process.env.LITELLM_API_KEY;
+    delete process.env.LITELLM_SEND_DIMENSIONS;
  });

  afterEach(() => {
@@ -55,6 +58,15 @@ describe("embedding-provider", () => {
      expect(provider.name).toBe("lmstudio");
    });

+    it("creates LiteLLMEmbeddingProvider when configured", async () => {
+      process.env.EMBEDDING_PROVIDER = "litellm";
+      process.env.LITELLM_API_KEY = "sk-master-test";
+      process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+      process.env.EMBEDDING_DIMENSIONS = "1536";
+      const provider = await getEmbeddingProvider();
+      expect(provider.name).toBe("litellm");
+    });
+
    it("caches provider instance", async () => {
      const p1 = await getEmbeddingProvider();
      const p2 = await getEmbeddingProvider();
@@ -201,3 +213,89 @@ describe("LMStudioEmbeddingProvider", () => {
    expect(provider.name).toBe("lmstudio");
  });
 });
+
+describe("LiteLLMEmbeddingProvider", () => {
+  const originalEnv = { ...process.env };
+
+  beforeEach(() => {
+    resetEmbeddingConfig();
+    resetEmbeddingProvider();
+    delete process.env.EMBEDDING_PROVIDER;
+    delete process.env.EMBEDDING_MODEL;
+    delete process.env.EMBEDDING_DIMENSIONS;
+    delete process.env.EMBEDDING_CONTEXT_LENGTH;
+    delete process.env.LITELLM_URL;
+    delete process.env.LITELLM_API_KEY;
+    delete process.env.LITELLM_SEND_DIMENSIONS;
+  });
+
+  afterEach(() => {
+    resetEmbeddingConfig();
+    resetEmbeddingProvider();
+    process.env = { ...originalEnv };
+  });
+
+  it("config validation rejects construction when LITELLM_API_KEY is missing", async () => {
+    // The API key is checked at config-load time (loadEmbeddingConfig), not at
+    // factory invocation, so the throw surfaces inside getEmbeddingProvider().
+    process.env.EMBEDDING_PROVIDER = "litellm";
+    process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+    process.env.EMBEDDING_DIMENSIONS = "1536";
+    // Intentionally no LITELLM_API_KEY.
+
+    await expect(getEmbeddingProvider()).rejects.toThrow(/LITELLM_API_KEY is required/);
+  });
+
+  it("ensureReady throws an actionable error when the proxy is unreachable", async () => {
+    process.env.EMBEDDING_PROVIDER = "litellm";
+    process.env.LITELLM_API_KEY = "sk-master-test";
+    process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+    process.env.EMBEDDING_DIMENSIONS = "1536";
+    // Closed port → SDK fails fast with a connection error, not an auth error.
+    process.env.LITELLM_URL = "http://127.0.0.1:1/v1";
+
+    const provider = await getEmbeddingProvider();
+    await expect(provider.ensureReady()).rejects.toThrow(
+      /LiteLLM proxy is not reachable at http:\/\/127\.0\.0\.1:1\/v1/,
+    );
+  });
+
+  it("healthCheck reports missing API key without making any network call", async () => {
+    process.env.EMBEDDING_PROVIDER = "litellm";
+    process.env.LITELLM_API_KEY = "sk-master-test";
+    process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+    process.env.EMBEDDING_DIMENSIONS = "1536";
+    // Even with a deliberately closed port, missing-key path must short-circuit
+    // before the SDK attempts a connection, so the test stays deterministic.
+    process.env.LITELLM_URL = "http://127.0.0.1:1/v1";
+
+    const provider = await getEmbeddingProvider();
+    // Now drop the key for the health-check call only — the provider re-reads
+    // process.env on each invocation (intentional, see provider-litellm.ts).
+    delete process.env.LITELLM_API_KEY;
+
+    const health = await provider.healthCheck();
+    expect(health.available).toBe(false);
+    expect(health.modelReady).toBe(false);
+    expect(health.statusLines.some((l) => l.includes("LiteLLM API key") && l.includes("Missing"))).toBe(true);
+  });
+
+  it("healthCheck reports unreachable proxy without throwing", async () => {
+    process.env.EMBEDDING_PROVIDER = "litellm";
+    process.env.LITELLM_API_KEY = "sk-master-test";
+    process.env.EMBEDDING_MODEL = "text-embedding-3-small";
+    process.env.EMBEDDING_DIMENSIONS = "1536";
+    process.env.LITELLM_URL = "http://127.0.0.1:1/v1";
+
+    const provider = await getEmbeddingProvider();
+    const health = await provider.healthCheck();
+
+    expect(health.available).toBe(false);
+    expect(health.modelReady).toBe(false);
+    expect(
+      health.statusLines.some(
+        (l) => l.includes("LiteLLM") && l.includes("Not reachable"),
+      ),
+    ).toBe(true);
+  });
+});