Observability for AI Agents — Monitoring Every Message and LLM Call in FabrCore
When an agent starts producing unexpected answers, the first thing you need is visibility. What messages did it receive? What did it send to the LLM? How many tokens did it burn? FabrCore's agent monitoring system captures all of this — messages, events, and individual LLM calls — in three independent buffers you can query, subscribe to, and build dashboards on top of.
Enabling the Monitor
Monitoring is opt-in. Enable it in AddFabrCoreServer with a single call. By default, LLM call capture records metadata only (model, tokens, duration). To capture the actual prompts and responses, set CapturePayloads = true:
// Metadata-only (default) — safe for production
builder.AddFabrCoreServer(options =>
{
options.UseInMemoryAgentMessageMonitor();
});
// Full payload capture with redaction
builder.AddFabrCoreServer(options =>
{
options.UseInMemoryAgentMessageMonitor(capture =>
{
capture.CapturePayloads = true;
capture.MaxPayloadChars = 4_000;
capture.MaxToolArgsChars = 2_000;
capture.MaxBufferedCalls = 1_000;
capture.Redact = s =>
Regex.Replace(s, "sk-[A-Za-z0-9]+", "***");
});
});
When monitoring is not enabled, a no-op NullAgentMessageMonitor is registered. Agents always have a valid dependency — no null checks needed.
The LlmCaptureOptions control what gets stored:
| Property | Default | Description |
|---|---|---|
Enabled | true | Master switch for LLM call recording |
CapturePayloads | false | Store actual prompts, responses, and tool args |
MaxPayloadChars | 8000 | Per-field character cap on captured text |
MaxToolArgsChars | 4000 | Character cap on tool call arguments |
Redact | null | Optional Func<string, string> for PII/secret scrubbing |
MaxBufferedCalls | 2000 | FIFO cap on the LLM call buffer |
Three Independent Data Tracks
The monitor maintains three separate buffers, each with its own query method and notification event. This means a burst of events cannot evict your message history, and a flood of LLM calls cannot push out captured events.
Messages — MonitoredMessage
Every inbound request and outbound response that flows through an agent grain is captured as a MonitoredMessage. Outbound responses include LlmUsageInfo with aggregated token counts. Messages routed through OnMessageBusy (when the agent is already processing) are flagged with BusyRouted = true.
var monitor = serviceProvider.GetRequiredService<IAgentMessageMonitor>();
// All messages, most recent first
var all = await monitor.GetMessagesAsync();
// Messages for a specific agent, limited to 50
var filtered = await monitor.GetMessagesAsync(
agentHandle: "user1:my-agent", limit: 50);
// Filter out system messages
var chatMessages = all.Where(m =>
!SystemMessageTypes.IsSystemMessage(m.MessageType));
Events — MonitoredEvent
Fire-and-forget events that reach an agent's OnEvent handler are captured as MonitoredEvent records. These include the event type, source handle, namespace, and channel — but intentionally exclude the Data and BinaryData payloads to keep the buffer readable.
// Events delivered to a specific agent
var agentEvents = await monitor.GetEventsAsync(
agentHandle: "user1:my-agent");
// Last 10 events across all agents
var recentEvents = await monitor.GetEventsAsync(limit: 10);
LLM Calls — MonitoredLlmCall
Every individual LLM request/response pair is recorded as a MonitoredLlmCall. Each record carries the model used, input/output token counts, duration, streaming flag, and an OriginContext that tells you exactly where the call came from:
var recentCalls = await monitor.GetLlmCallsAsync(limit: 20);
foreach (var call in recentCalls)
{
Console.WriteLine(
$"[{call.Timestamp:HH:mm:ss}] {call.AgentHandle} " +
$"{call.OriginContext} model={call.Model} " +
$"tokens={call.InputTokens}in/{call.OutputTokens}out " +
$"dur={call.DurationMs}ms");
if (call.ErrorMessage is not null)
Console.Error.WriteLine(
$" LLM error: {call.ErrorMessage}");
}
The OriginContext values include OnMessage:<id>, OnEvent:<type>, Timer:<name>, Reminder:<name>, Compaction, and Background. When the origin is OnMessage, the ParentMessageId field correlates the LLM call back to the specific MonitoredMessage that triggered it.
Real-Time Notifications
Each data track has its own notification event. Subscribe to push updates to a dashboard, SignalR hub, or logging pipeline without polling:
// Message notifications
monitor.OnMessageRecorded += message =>
{
Console.WriteLine(
$"[{message.Direction}] {message.FromHandle} " +
$"-> {message.ToHandle}: {message.MessageType}");
};
// Event notifications
monitor.OnEventRecorded += evt =>
{
Console.WriteLine(
$"Event: {evt.Type} from {evt.Source}");
};
// LLM call notifications
monitor.OnLlmCallRecorded += call =>
{
Console.WriteLine(
$"LLM: {call.AgentHandle} {call.Model} " +
$"{call.InputTokens}in/{call.OutputTokens}out");
};
Notifications fire after the operation completes on a fire-and-forget path, so a slow subscriber never blocks an agent's response. Always wrap subscriber logic in a try/catch — the framework does this internally, but defensive coding in your own handlers prevents one failing subscriber from swallowing exceptions silently.
Building a Custom Monitor Provider
The in-memory monitor is great for development and dashboards, but production systems often need durable storage. Implement IAgentMessageMonitor to back the monitor with a database, message queue, or external analytics service:
public class SqlAgentMessageMonitor : IAgentMessageMonitor
{
public event Action<MonitoredMessage>? OnMessageRecorded;
public event Action<MonitoredEvent>? OnEventRecorded;
public event Action<MonitoredLlmCall>? OnLlmCallRecorded;
public LlmCaptureOptions LlmCaptureOptions { get; }
public async Task RecordMessageAsync(MonitoredMessage message)
{
await _db.ExecuteAsync(
"INSERT INTO MonitoredMessages ...", message);
try { OnMessageRecorded?.Invoke(message); }
catch { /* never let subscribers propagate */ }
}
public async Task RecordLlmCallAsync(MonitoredLlmCall call)
{
if (!LlmCaptureOptions.Enabled) return;
await _db.ExecuteAsync(
"INSERT INTO MonitoredLlmCalls ...", call);
try { OnLlmCallRecorded?.Invoke(call); }
catch { /* never let subscribers propagate */ }
}
}
Register the custom provider:
builder.AddFabrCoreServer(options =>
{
options.UseAgentMessageMonitor<SqlAgentMessageMonitor>();
});
The interface requires implementing RecordMessageAsync, RecordEventAsync, RecordLlmCallAsync, their corresponding query methods, GetAgentTokenSummaryAsync, GetAllAgentTokenSummariesAsync, and ClearAsync. Each record method should fire its notification event after persisting, wrapped in a try/catch.
Tuning the In-Memory Buffer
The default message and event buffer holds 5,000 entries each. The LLM call buffer defaults to 2,000 (lower because payloads can be larger). Older entries are evicted FIFO when each limit is reached. To customize these limits, register an instance directly:
builder.Services.AddSingleton<IAgentMessageMonitor>(sp =>
new InMemoryAgentMessageMonitor(
sp.GetRequiredService<ILogger<InMemoryAgentMessageMonitor>>(),
llmCaptureOptions: new LlmCaptureOptions
{
Enabled = true,
CapturePayloads = true,
MaxPayloadChars = 4_000,
MaxBufferedCalls = 5_000
},
maxMessages: 10_000));
Each buffer is bounded independently, so tuning one has no effect on the others. A chatty event stream cannot evict your message records, and a burst of LLM calls cannot push out captured events.
Built with FabrCore on .NET 10.
Builder of FabrCore and OpenCaddis.