Know What Your Agents Cost — Built-in LLM Token Tracking in FabrCore

Eric Brasher March 12, 2026 at 2:15 PM 6 min read

AI agents that call LLMs can run up costs quickly — especially when tools trigger multiple rounds of inference, compaction summarizes long histories, or background tasks generate reports on a schedule. FabrCore captures token usage on every single LLM call an agent makes, regardless of where that call originates, and rolls the data up into per-agent summaries you can query at any time.

TokenTrackingChatClient — The Recording Layer

At the heart of FabrCore's cost tracking is TokenTrackingChatClient, a DelegatingChatClient that wraps every chat client an agent uses. When FabrCoreAgentProxy.GetChatClient creates a chat client, the tracking wrapper is automatically inserted. Every call to GetResponseAsync or GetStreamingResponseAsync passes through it.

For each LLM round-trip, the tracker records:

Metric	Description
`InputTokens`	Prompt tokens consumed
`OutputTokens`	Completion tokens generated
`ReasoningTokens`	Reasoning tokens (when applicable)
`CachedInputTokens`	Tokens served from provider cache
`Model`	Model identifier used
`DurationMs`	Wall-clock time of the LLM call

When the monitor is not enabled, TokenTrackingChatClient checks LlmCaptureOptions.Enabled and short-circuits the capture path with zero allocation cost. There is no performance penalty for having the wrapper in the pipeline when monitoring is off.

Scoped Tracking with LlmUsageScope

Token counts need to be attributed to the right agent and the right message. FabrCore uses LlmUsageScope — an AsyncLocal scope that FabrCoreAgentProxy.InternalOnMessage sets automatically before your OnMessage code runs. The scope carries the agent handle, parent message ID, trace ID, and an origin tag like OnMessage:<id>.

Every LLM call made inside that scope — whether from direct inference, tool execution, or nested agent calls — is tagged with the scope's metadata. When the response is sent back, the accumulated LlmUsageInfo is stamped on the outbound MonitoredMessage:

C# — LlmUsageInfo on a Monitored Response

if (message.LlmUsage is { } usage)
{
    Console.WriteLine($"LLM: {usage.LlmCalls} calls, " +
        $"{usage.InputTokens}in/{usage.OutputTokens}out tokens, " +
        $"model={usage.Model}");
}

The LlmUsageInfo object aggregates all calls within a single OnMessage execution — if your agent makes three LLM calls to handle one user message (inference, tool call, follow-up), the usage reflects the total.

LlmCallContext — Tracking Outside OnMessage

Not all LLM calls happen inside OnMessage. Agents can call LLMs from timer callbacks, event handlers, compaction, or background tasks. For these paths, FabrCore provides LlmCallContext — another AsyncLocal that overrides the origin tag.

The framework automatically wraps several non-message paths: OnEvent dispatch is tagged as OnEvent:<type>, timer and reminder ticks as Timer:<name> or Reminder:<name>, and compaction as Compaction. For custom background work, wrap the code yourself:

C# — Wrapping Background LLM Calls

public class ReportingAgent : FabrCoreAgentProxy
{
    private async Task GenerateDailyReport()
    {
        // No OnMessage scope here — wrap manually
        using (LlmCallContext.Begin(
            agentHandle: fabrcoreAgentHost.GetHandle(),
            originContext: "Background:DailyReport"))
        {
            var chatClient = await GetChatClient("OpenAIProd");
            var response = await chatClient.GetResponseAsync(new[]
            {
                new ChatMessage(ChatRole.System,
                    "Summarize yesterday's activity."),
            });
            // MonitoredLlmCall will carry
            // OriginContext = "Background:DailyReport"
        }
    }
}

The three-tier attribution fallback ensures every LLM call is tagged: LlmUsageScope first (from OnMessage), then LlmCallContext (from event/timer/background wrappers), and finally the constructor-captured handle as a last resort (tagged as Background).

Querying Per-Agent Token Summaries

FabrCore accumulates token totals into AgentTokenSummary objects that you can query through the monitor. This gives you a running total of every token an agent has consumed across all its LLM calls:

C# — Querying Token Summaries

var monitor = serviceProvider.GetRequiredService<IAgentMessageMonitor>();

// Single agent summary
var summary = await monitor.GetAgentTokenSummaryAsync("user1:my-agent");
if (summary != null)
{
    Console.WriteLine($"Input tokens:  {summary.TotalInputTokens}");
    Console.WriteLine($"Output tokens: {summary.TotalOutputTokens}");
    Console.WriteLine($"Total calls:   {summary.TotalLlmCalls}");
}

// All agents at once
var allSummaries = await monitor.GetAllAgentTokenSummariesAsync();
foreach (var s in allSummaries)
{
    Console.WriteLine($"{s.AgentHandle}: " +
        $"{s.TotalInputTokens + s.TotalOutputTokens} total tokens");
}

Combined with the per-call MonitoredLlmCall records (covered in our observability post), you get both the granular detail and the high-level rollup needed to track costs across your entire agent fleet.

Built with FabrCore on .NET 10.

Eric Brasher

Builder of FabrCore and OpenCaddis.