How FabrCore Keeps Agents Talking Forever — Automatic Chat Compaction

Eric Brasher March 19, 2026 at 3:30 PM 6 min read

Long-running conversations eventually hit a wall: the context window fills up, and the next LLM call either fails or silently drops older messages. FabrCore solves this with automatic chat compaction — after every OnMessage, the framework checks whether stored chat history has exceeded a configurable token threshold. If it has, older messages are summarized via an LLM call and replaced with a compact summary. The agent keeps working without interruption, and no application code is required.

How Compaction Works

Compaction runs automatically after every OnMessage completes. The CompactionService estimates the token count of the stored chat history and compares it against the threshold. If the history exceeds the threshold, the service:

  1. Keeps the most recent N messages (controlled by CompactionKeepLastN)
  2. Takes the older messages and sends them to the LLM for summarization
  3. Replaces the older messages with the compact summary
  4. The agent's next LLM call sees a shorter history that preserves context

The threshold is expressed as a fraction of the total context window. With the default settings — a 25,000-token context window and a 0.75 threshold — compaction triggers when chat history exceeds roughly 18,750 tokens. The last 20 messages are always preserved verbatim so the agent has full fidelity on recent conversation turns.

Configuring Compaction

Settings resolve in a three-tier hierarchy: framework defaults, then model-level settings in fabrcore.json, then agent-level overrides in AgentConfiguration.Args. This lets you set reasonable defaults for your model and fine-tune specific agents that need different behavior.

Model-Level Settings (fabrcore.json)

Each model entry in fabrcore.json can include compaction settings:

FieldDefaultDescription
ContextWindowTokens25000Total context window size in tokens
CompactionEnabledtrueEnable or disable compaction
CompactionKeepLastN20Number of recent messages to keep verbatim
CompactionThreshold0.75Trigger at this fraction of the context window

Agent-Level Overrides

Individual agents can override any compaction setting through their Args dictionary, using underscore-prefixed keys:

C# — Agent-Level Compaction Overrides
var agentConfig = new AgentConfiguration
{
    Handle = "long-running-agent",
    AgentType = "long-running-agent",
    Models = "default",
    SystemPrompt = "You are a long-running support agent.",
    Args = new Dictionary<string, string>
    {
        ["_CompactionEnabled"] = "true",
        ["_CompactionMaxContextTokens"] = "50000",
        ["_CompactionKeepLastN"] = "30",
        ["_CompactionThreshold"] = "0.80"
    }
};

This is particularly useful for agents backed by models with large context windows. An agent using a 128K-token model might set a higher _CompactionMaxContextTokens and a lower threshold to make full use of the available space before compacting.

Custom Compaction with OnCompaction

The default compaction behavior works well for most agents, but sometimes you need control over the summarization strategy — a different prompt, a cheaper model for summarization, or domain-specific logic that preserves certain message types. Override OnCompaction in your agent:

C# — Overriding OnCompaction
public override async Task<CompactionResult?> OnCompaction(
    FabrCoreChatHistoryProvider chatHistoryProvider,
    CompactionConfig compactionConfig,
    int estimatedTokens = 0)
{
    // Option 1: Use the default CompactionService
    return await base.OnCompaction(
        chatHistoryProvider, compactionConfig, estimatedTokens);

    // Option 2: Custom logic — use a cheaper model,
    // different prompt, or domain-specific strategy

    // Option 3: Return null to skip compaction this cycle
}

The OnCompaction override receives the FabrCoreChatHistoryProvider (which gives access to the stored chat history), a CompactionConfig with the resolved settings, and the estimated token count that triggered compaction. Return a CompactionResult to apply the compacted history, or null to skip compaction for this cycle.

The default implementation delegates to CompactionService, which sends the older messages to the configured LLM with a summarization prompt. If a CompactionModel is configured, compaction uses that model instead of the agent's primary model — useful for routing summarization to a cheaper, faster model while keeping the primary model for user-facing responses.

Monitoring Compaction and Stale Detection

Compaction LLM calls are tracked by the agent monitoring system (when enabled) with OriginContext = "Compaction". This means you can see exactly how many tokens your agents spend on summarization versus user-facing inference, and spot agents that are compacting too frequently.

FabrCore also includes stale detection for the primary OnMessage handler. If an agent's OnMessage has been running for more than 5 minutes — potentially stuck on a deadlocked tool or an unresponsive LLM call — the grain treats it as stale. New messages are no longer routed to OnMessageBusy but are instead processed as a fresh primary message. This prevents a stuck agent from blocking all subsequent requests indefinitely.

Between compaction keeping conversations within budget and stale detection preventing runaway processing, FabrCore agents are designed to run continuously without manual intervention — handling conversations that span hours, days, or indefinitely.


Built with FabrCore on .NET 10.


Eric Brasher

Builder of FabrCore and OpenCaddis.