Working Group Foundations

Working Group is an LLM system designed to solve deep, real-world problems through human-in-the-loop, hierarchical agent coordination. It does this by progressively reducing uncertainty about the problem and its requirements, and about how to construct an effective solution. The system is structured to incorporate outside information automatically as needed, and to systematically partition the cumulative work context so that individual LLM calls see only what’s relevant at that moment and location. It does this by dynamically organizing work into a hierarchy of “pods” that maintain their own state, and by leveraging the division of labor inherent in the pods’ bi-layer structure. Within each pod, a multi-participant deliberation and planning process operates as the base layer, overseen by an introspective “cortex” layer that observes and performs actions in response to detected patterns/conditions.

Solving problems by reducing uncertainty about how to construct solutions
Architecture
How it's used
Scaling LLM problem-solving with systematic attention management
Roadmap

Solving problems by reducing uncertainty about how to construct solutions

Working Group's core drive is to reduce uncertainty about how to construct a target artifact. As it runs, a work trace evolves capturing the steps of this process; the trace then becomes input to a model that generates the target artifact. Since the trace captures the progressive clarification of the objective and path to solution, it holds exactly the information the model needs to construct the solution artifact.

The three main sources of uncertainty are:

uncertainty about requirements for the artifact: Goal Uncertainty
uncertainty about external world/system states which the requirements depend on: World Uncertainty
uncertainty about aspects of how to coherently formulate an artifact that satisfies all requirements: Solution Uncertainty

Goal uncertainty is reduced by asking questions of the goal-giver.

World uncertainty is reduced by obtaining information from external systems, e.g. web or local file system.

Solution uncertainty is reduced through a cycle of proposing and verifying formulation candidates, refining formulations in response to small errors and taking alternate approaches in response to estimated non-viability. Knowledge of prior/related work is incorporated via e.g. web search.

Architecture

The Working Group system is comprised of a supervised, dynamic pod hierarchy. Pods are stateful, isolated working units that can be spawned synchronously or asynchronously, and return artifacts as results. Pods are constructed with an objective describing an artifact it should construct, a number of rounds to run for, and a set of agent configs (defining names, system prompts, tools, etc).

The base layer activity of a pod is controlled by a scheduler that decides which agent to run at every tick of the system. The base layer is overseen by a secondary system called a cortex which gets scheduled after every agent turn. The cortex's behavior is determined by the set of "sentinels" it's configured with. Each cortex sentinel observes the base layer for a single pattern; when the pattern is detected it reacts by executing an action that affects the pod's operation in some way (e.g. pausing to clarify objective, injecting a message into the base layer, updating an artifact, spawning a sub-pod).

The drive to reduce uncertainty (Goal, World, and Solution) comes out of the prompts which back the agents and the cortex sentinels; the 'drive' is actualized by mechanisms which allow pods to halt and ask questions of the user or parent pods, or to retrieve external information. Agents are prompted with an understanding of the system in which they operate; their system prompts communicate the pod objective and supply behavioral/perspectival diversity. An agent's primary input at each turn is the work trace so far, and its output (a proposed idea, a critique, the result of a tool call, etc.) is appended to the work trace after each turn. When the pod hits its round limit, or meets its objective (by deeming itself no longer uncertain about how to generate the target artifact), a cortex sentinel uses the work trace to generate the pod's target artifact.

Work traces may look a bit like a discussion (hence the name 'working group'), but in its ideal form should resemble a "solution uncertainty reduction narrative". This 'narrative' is essentially a composite chain-of-thought with contributions from diversely configured LLMs, injected outside/system info, and user input.

An architecture diagram for Working Group

Architecture of the Working Group software

How it's used

Working Group is being written as a Rust terminal utility. You run it by supplying a root pod config that defines an objective, agent roster, and cortex sentinel set; it comes with built-in defaults so you could, e.g., just supply an objective and it would fill in the rest. If you use the --headless flag it simply emits a final artifact to stdout upon completion (optionally pausing to ask user questions along the way). But it also comes with a TUI resembling a multi-agent chatroom where users can manually step through/select agent turns, speak within any pod, cancel pods, etc.

Working Group TUI interface showing multi-agent interaction

Pre-alpha screenshot of the working group TUI

Scaling LLM Problem-Solving with Systematic Attention Management

LLMs have limited attention. As their input length increases, the attention paid to the content of the input becomes increasingly diffuse and performance degrades. When asked to perform one of a large number of possible actions, the attention given to considering implications of each possibility becomes diffuse and performance degrades.

Any aspect of the system requiring something like intelligence (judgement, tool-use, solution formulation) is backed by an LLM step, where some context is computed and arranged to be the input to an LLM call. The attentional burden of that input must be managed systematically to prevent LLM performance degradation.

The core architecture is motivated by this need of systematically distributing the problem context and decision complexity among components so that each may operate with sufficient attention to perform at the maximum capability of its backing LLM.

This segmentation can be seen in the context isolation of individual 'pods' which solve sub-problems/construct sub-artifacts, and in the division of labor between the base layer activity of a pod and the 'cortex' layer which oversees it and performs actions on its behalf. The base layer can focus on the meat of the problem itself, while the cortex brings in supplementary information, modulates system parameters, drafts and stores candidate artifacts and so on. The cortex itself is divided into a number of Sentinels, which each observe the base layer for only a single aspect of the pod's operation

Roadmap

Most of the foundational pieces of the Working Group software have already been built; below is a rough roadmap of what remains to be done. The screenshot gives a sense of the size and organization of the codebase.

Sample of the state of the Working Group codebase

Core runtime & Pod supervision

✓ Working Group supervisor — pod creation/destruction, global event bus, Command/Event routing.
✓ Checkpointing — autosave + restore from checkpoints.
□ Hierarchical pods — spawn child pods (set up for, but not fully wired); add results to parent work trace.

Pod Engine

✓ Turn loop & phases — run next Participant in queue, re-schedule Cortex; state machine: Init/Idle/Waiting/Running.
✓ Command/Event system — fully controllable by Commands; emits Events for any important state change.
✓ Manual stepping — debugger-like stepping through Participant turns.
✓ Overrides & cancellation — preempt running turns if needed; clean shutdown.
✓ Round lifecycle — RoundStarted/Ended, max_rounds, exit_on_complete.

Cortex & Sentinels

✓ Cortex — holds a collection of LLM-backed "Sentinels" with access to PodState that return EngineCommands.
✓ Cortex runner — batched Sentinel execution (unordered + ordered) and outgoing EngineCommand reconciliation.
✓ Sentinel definition by config — partially complete, but construction pathway not ideal.
✓ Scheduling Sentinel — intelligent (LLM-backed; round-robin fallback) next Participant selector.
□ Pause-for-input Sentinel — issue AwaitUserInput Command to Engine in response to Goal/World uncertainty.
□ Artifact constructor Sentinel — use work trace to construct/update artifact at appropriate times.
□ Delegation Sentinel — spawn sub-pods when sizeable sub-problems become apparent.
□ Evidence/World Sentinel — perform web/filesystem queries when missing info detected.

Participants & prompting

✓ LLM Participant definition by config — select openrouter models, define system prompts etc.
✓ LLM Participant behavior — streaming LLM responses; PASS_TOKEN handling; end-of-turn Command.
✓ Centralized prompt builders — preamble → sys prompt → PodState subset; variant for Sentinels.
□ Tool-augmented Participants — function-calling / tool schema; results fused into trace.

Work trace & artifacts

✓ Work trace building — captures solution uncertainty reduction "narrative".
□ First-class artifacts — artifacts as store entries, versions, tags.
□ Artifact lifecycle events — create/update/finalize; persistence and UI surfaces.

Services & integrations

✓ LLM backend abstraction — access to streaming calls to openrouter via ServiceHub.
□ World interfaces — web search/fetch, filesystem connection (e.g. Claude Code SDK).

UI / clients

✓ Headless mode — no chat view; runtime events optionally printed; stdout artifact emission set up for.
✓ TUI — chatroom view; manual stepping; basic controls.
□ HTTP — for e.g. web UI.