fix(lmstudio): force encoding_format=float to avoid SDK base64 decode

The OpenAI SDK (≥6.x, since openai-node#1312) auto-injects
`encoding_format: "base64"` into every embeddings request when the
caller doesn't specify one, then unconditionally decodes the response
with `toFloat32Array(embedding as unknown as string)`.

LM Studio's Local Server ignores `encoding_format` and always returns a
JSON array of floats. The SDK then runs `Buffer.from(<array>, 'base64')`
— Node.js silently drops the encoding parameter for array inputs and
clamps each float (<1.0) to uint8 0, producing a 4096-byte zero buffer
that gets reinterpreted as a 1024-element Float32Array of zeros.

Net effect: every LM Studio embedding came back as 1024 zeros regardless
of the model's true dimension. Qdrant then rejected the upserts with
`Vector dimension error: expected dim: <model>, got 1024`, and indexing
silently failed with all points skipped.

Fix: pass `encoding_format: "float"` explicitly. The SDK detects the
user-provided value (hasUserProvidedEncodingFormat=true), skips the
decode step, and returns LM Studio's float array as-is.

Verified with Qwen3-Embedding-8B (4096-dim): all DEBUG_LMSTUDIO_EMBED
log entries now show firstEmbeddingDim=4096, no skipped upserts.
This commit is contained in:
AirMonitor
2026-05-04 12:59:20 +02:00
parent f10786530f
commit bb141a0b3f
+11
View File
@@ -179,9 +179,20 @@ export class LMStudioEmbeddingProvider implements EmbeddingProvider {
// No `dimensions` parameter: LM Studio doesn't implement Matryoshka projection.
// The model returns its native dimension and we trust the user to have set
// EMBEDDING_DIMENSIONS to match.
//
// `encoding_format: "float"` is REQUIRED. The OpenAI SDK (6.x+) defaults to
// `encoding_format: "base64"` for performance, then unconditionally decodes the
// response with toFloat32Array(). LM Studio ignores `encoding_format` and always
// returns a plain JSON array of floats. The SDK's decode path then runs
// `Buffer.from(<array>, 'base64')` — Node.js silently drops the encoding for
// array inputs and clamps each float (<1.0) to uint8 0, producing a 4096-byte
// zero buffer that gets reinterpreted as a 1024-element Float32Array of zeros.
// Setting `encoding_format: "float"` makes the SDK skip the decode step entirely
// (see openai-node/src/resources/embeddings.ts: `if (hasUserProvidedEncodingFormat)`).
const response = await client.embeddings.create({
model,
input: texts,
encoding_format: "float",
});
const sorted = response.data.sort((a, b) => a.index - b.index);
return sorted.map((d) => d.embedding);