Batch Embeddings API — 1000x Fewer HTTP Calls
When you're building agents that process knowledge bases, ingest documents, or manage persistent memory, embeddings are the backbone. Every chunk of text your agent stores or searches gets turned into a vector, and until now, that meant one HTTP request per chunk.
If your agent ingests a 50-page document that splits into 200 chunks, that's 200 HTTP round-trips to the FabrCore host, 200 model configuration lookups, and 200 separate calls to the OpenAI embedding API. For customers processing thousands of records, this added up to real latency and unnecessary load.
Today we're shipping three changes that collapse all of that into a single call.
1. Batch Embeddings Endpoint
A new POST /fabrcoreapi/embeddings/batch endpoint accepts up to 2,048 texts in one request. Each item carries a caller-provided id so you can map results back to your records without relying on array ordering alone.
{
"items": [
{ "id": "chunk-001", "text": "FabrCore uses Orleans for distributed agent hosting..." },
{ "id": "chunk-002", "text": "Agents communicate via message passing..." },
{ "id": "chunk-003", "text": "The memory system supports semantic search..." }
]
}
One request in, one response back — with all your vectors.
{
"results": [
{ "id": "chunk-001", "vector": [0.0123, -0.0456, ...], "dimensions": 1536 },
{ "id": "chunk-002", "vector": [0.0789, -0.0321, ...], "dimensions": 1536 },
{ "id": "chunk-003", "vector": [0.0654, -0.0987, ...], "dimensions": 1536 }
]
}
Validation rules (all return 400 Bad Request):
| Condition | Error Message |
|---|---|
| Items is null or empty | "Items list must not be empty." |
| Any item has empty id | "Item at index {i} has an empty Id." |
| Any item has empty text | "Item at index {i} (Id='{id}') has empty Text." |
| More than 2,048 items | "Batch size {n} exceeds maximum of 2048." |
2. Singleton Embedding Client
Previously, the embedding service was registered as Transient. Every single request created a new instance, which meant re-fetching the model configuration and API key from scratch each time. We've moved to Singleton registration with thread-safe lazy initialization. The cached embedding client now lives for the lifetime of the application, eliminating redundant lookups.
3. Native Batch in Memory Tools
The memory_write tool — used by agents to persist knowledge — previously embedded chunks in a sequential loop. It now calls the batch API internally, meaning a single memory_write invocation that produces 20 chunks makes one embedding call instead of 20.
The Numbers
For a workload that embeds 1,000 text chunks:
| Metric | Before | After |
|---|---|---|
| HTTP calls to FabrCore API | 1,000 | 1 |
| Model config + API key lookups | 1,000 | 1 |
| Calls to OpenAI embedding API | 1,000 | 1 |
This isn't a micro-optimization — it's a fundamental reduction in round-trips that directly translates to faster document ingestion, lower latency memory writes, and reduced load on both your FabrCore host and your embedding provider.
Get Started
If you're using IFabrCoreHostApiClient, the new method is:
var client = serviceProvider
.GetRequiredService<IFabrCoreHostApiClient>();
var items = documents.Select((doc, i) => new BatchEmbeddingItem
{
Id = doc.Id,
Text = doc.Content
}).ToList();
var response = await client.GetBatchEmbeddingsAsync(items);
foreach (var result in response.Results)
{
Console.WriteLine($"{result.Id}: {result.Dimensions} dimensions");
}
If you're calling the REST API directly:
const response = await fetch('/fabrcoreapi/embeddings/batch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
items: [
{ id: 'doc-1', text: 'First document' },
{ id: 'doc-2', text: 'Second document' }
]
})
});
const { results } = await response.json();
// results[0].id === 'doc-1', results[0].vector === [...], etc.
The SDK's IEmbeddings interface also has a new batch method:
Task<IReadOnlyList<Embedding<float>>> GetBatchEmbeddings(
IReadOnlyList<string> texts);
New Types
Four new DTOs in FabrCore.Client:
| Type | Properties |
|---|---|
BatchEmbeddingItem | string Id, string Text |
BatchEmbeddingRequest | List<BatchEmbeddingItem> Items |
BatchEmbeddingResultItem | string Id, float[] Vector, int Dimensions |
BatchEmbeddingResponse | List<BatchEmbeddingResultItem> Results |
Backward Compatible
The existing single-text POST /fabrapi/Embeddings endpoint is completely unchanged. Existing clients, tools, and integrations continue to work as-is. The batch endpoint is additive — adopt it when you're ready.
Agents using the memory_write tool get the batch improvement automatically — no code changes needed on the agent side.
Built with FabrCore on .NET 10.
Builder of FabrCore and OpenCaddis.