Batch Embeddings API — 1000x Fewer HTTP Calls

Eric Brasher February 23, 2026 at 8:45 AM 6 min read

When you're building agents that process knowledge bases, ingest documents, or manage persistent memory, embeddings are the backbone. Every chunk of text your agent stores or searches gets turned into a vector, and until now, that meant one HTTP request per chunk.

If your agent ingests a 50-page document that splits into 200 chunks, that's 200 HTTP round-trips to the FabrCore host, 200 model configuration lookups, and 200 separate calls to the OpenAI embedding API. For customers processing thousands of records, this added up to real latency and unnecessary load.

Today we're shipping three changes that collapse all of that into a single call.

1. Batch Embeddings Endpoint

A new POST /fabrcoreapi/embeddings/batch endpoint accepts up to 2,048 texts in one request. Each item carries a caller-provided id so you can map results back to your records without relying on array ordering alone.

POST /fabrcoreapi/embeddings/batch

{
  "items": [
    { "id": "chunk-001", "text": "FabrCore uses Orleans for distributed agent hosting..." },
    { "id": "chunk-002", "text": "Agents communicate via message passing..." },
    { "id": "chunk-003", "text": "The memory system supports semantic search..." }
  ]
}

One request in, one response back — with all your vectors.

Response

{
  "results": [
    { "id": "chunk-001", "vector": [0.0123, -0.0456, ...], "dimensions": 1536 },
    { "id": "chunk-002", "vector": [0.0789, -0.0321, ...], "dimensions": 1536 },
    { "id": "chunk-003", "vector": [0.0654, -0.0987, ...], "dimensions": 1536 }
  ]
}

Validation rules (all return 400 Bad Request):

Condition	Error Message
Items is null or empty	`"Items list must not be empty."`
Any item has empty id	`"Item at index {i} has an empty Id."`
Any item has empty text	`"Item at index {i} (Id='{id}') has empty Text."`
More than 2,048 items	`"Batch size {n} exceeds maximum of 2048."`

2. Singleton Embedding Client

Previously, the embedding service was registered as Transient. Every single request created a new instance, which meant re-fetching the model configuration and API key from scratch each time. We've moved to Singleton registration with thread-safe lazy initialization. The cached embedding client now lives for the lifetime of the application, eliminating redundant lookups.

3. Native Batch in Memory Tools

The memory_write tool — used by agents to persist knowledge — previously embedded chunks in a sequential loop. It now calls the batch API internally, meaning a single memory_write invocation that produces 20 chunks makes one embedding call instead of 20.

The Numbers

For a workload that embeds 1,000 text chunks:

Metric	Before	After
HTTP calls to FabrCore API	1,000	1
Model config + API key lookups	1,000	1
Calls to OpenAI embedding API	1,000	1

This isn't a micro-optimization — it's a fundamental reduction in round-trips that directly translates to faster document ingestion, lower latency memory writes, and reduced load on both your FabrCore host and your embedding provider.

Get Started

If you're using IFabrCoreHostApiClient, the new method is:

C# — IFabrCoreHostApiClient

var client = serviceProvider
    .GetRequiredService<IFabrCoreHostApiClient>();

var items = documents.Select((doc, i) => new BatchEmbeddingItem
{
    Id = doc.Id,
    Text = doc.Content
}).ToList();

var response = await client.GetBatchEmbeddingsAsync(items);

foreach (var result in response.Results)
{
    Console.WriteLine($"{result.Id}: {result.Dimensions} dimensions");
}

If you're calling the REST API directly:

JavaScript / HTTP

const response = await fetch('/fabrcoreapi/embeddings/batch', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    items: [
      { id: 'doc-1', text: 'First document' },
      { id: 'doc-2', text: 'Second document' }
    ]
  })
});

const { results } = await response.json();
// results[0].id === 'doc-1', results[0].vector === [...], etc.

The SDK's IEmbeddings interface also has a new batch method:

IEmbeddings — New Method

Task<IReadOnlyList<Embedding<float>>> GetBatchEmbeddings(
    IReadOnlyList<string> texts);

New Types

Four new DTOs in FabrCore.Client:

Type	Properties
`BatchEmbeddingItem`	string Id, string Text
`BatchEmbeddingRequest`	List<BatchEmbeddingItem> Items
`BatchEmbeddingResultItem`	string Id, float[] Vector, int Dimensions
`BatchEmbeddingResponse`	List<BatchEmbeddingResultItem> Results

Backward Compatible

The existing single-text POST /fabrapi/Embeddings endpoint is completely unchanged. Existing clients, tools, and integrations continue to work as-is. The batch endpoint is additive — adopt it when you're ready.

Agents using the memory_write tool get the batch improvement automatically — no code changes needed on the agent side.

Server API Docs Embeddings Docs FabrCore on GitHub

Built with FabrCore on .NET 10.

Eric Brasher

Builder of FabrCore and OpenCaddis.