Founders Note Multishot Agent Learning - Semantic Infrastructure Lab

Author: Scott Senatore, Semantic Infrastructure Lab
Date: 2025-12-04
Type: Founder's Note / Blog Post
Status: Draft

The Static Prompt Problem

You're building an AI agent system. You want your agent to use tools effectively - run commands, call APIs, query databases. So you do the obvious thing: you write examples into the system prompt.

SYSTEM_PROMPT = """
You are an AI assistant with access to these tools:

reveal <file> - Show code structure
Example: reveal app.py
Example: reveal src/utils.py get_config

search <pattern> - Find files
Example: search "def main"
Example: search "*.py" | grep "import"

... (50 more tools with examples)
"""

This seems reasonable. Give the agent examples up front, and it will know how to use the tools.

But it's fundamentally broken.

Why Static Prompts Fail

Problem 1: Examples Go Stale

You ship reveal v0.13.0 with your examples. Three months later, reveal v0.15.0 adds:
- reveal help:// - Self-documenting system
- reveal --agent-help-full - Comprehensive workflows
- reveal --check - Code quality scanning
- reveal 'ast://src?complexity>10' - AST queries

Your agent still thinks it's using v0.13.0. It never discovers these features because they're not in the static prompt.

Problem 2: Prompt Bloat

You have 50 tools. Each tool has 5 example patterns. That's 250 examples in your system prompt.

At ~100 tokens per example, that's 25,000 tokens of examples loaded into every conversation before the user even says hello.

Cost: $25 per 1M tokens = $0.625 per conversation just for examples that might not even be used.

Problem 3: No Context Adaptation

User asks: "Find all complex functions in the codebase."

Your static examples show:

reveal app.py
reveal src/utils.py get_config

But the right pattern for this query is:

reveal 'ast://src?complexity>10' --format=json

This isn't in your examples. Because you wrote examples for basic usage, not advanced queries. Now you need to add MORE examples, which makes Problem 2 worse.

The Insight: Dynamic Documentation = Multi-Shot Learning

Here's the key realization: What if the agent could request examples on-demand?

Instead of:

[System Prompt with 25K tokens of examples]
User: "Find complex functions"
Agent: [tries to match static examples]

Do this:

[Minimal system prompt: "Request --agent-help before using tools"]
User: "Find complex functions"
Agent: reveal --agent-help-full
[Gets fresh, comprehensive examples]
Agent: [uses correct pattern from latest docs]

This is multi-shot learning.

The ML Parallel

In machine learning:
- Zero-shot: No examples, just task description
- One-shot: Single example provided
- Few-shot: Handful of examples (2-5 typically)

For AI agents:
- Static prompt: Fixed examples, loaded once
- Dynamic help: Request examples when needed
- Multi-shot: Unlimited fresh examples on-demand

Dynamic documentation is the agent equivalent of multi-shot learning.

The Pattern: `--agent-help` as a Standard

What Makes Good Agent Help?

SIL's --agent-help specification:

Purpose - What does this tool do? (1-2 sentences)
Syntax - How do you invoke it? (basic form)
Examples - Real usage patterns (REQUIRED - not optional!)
Workflows - Common task compositions
Pro Tips - Advanced usage, gotchas, when to use what

Example from reveal:

$ reveal --agent-help-full

## Core Purpose
Token-efficient code exploration. See structure before reading entire files.
Reduces token usage 10-150x for code analysis tasks.

## Basic Usage
reveal <file>                    # Structure overview (50 tokens vs 7500)
reveal <file> <function>         # Extract specific element
reveal <file> --check            # Code quality scan

## Advanced Examples

### Progressive Disclosure
reveal app.py --head 10         # First 10 elements (unknown file)
reveal app.py --range 20-30     # Elements 20-30 (large file)
reveal app.py --outline         # Hierarchical view

### Code Quality Queries
reveal src/ --check --select E,W  # Errors + warnings only
reveal 'ast://src?complexity>10'  # Find complex functions
reveal 'ast://app.py?lines>50'    # Long functions

### Pipeline Composition
git diff --name-only | reveal --stdin
find src/ -name "*.py" | reveal --stdin --check
reveal 'ast://src/' --format=json | jq '.results[] | .name'

## Workflows

### Unknown Codebase Exploration
1. reveal src/ --head 5              # Get initial structure
2. reveal 'ast://src?complexity>10'  # Find complex areas
3. reveal src/core.py main           # Extract key function
4. reveal src/core.py --check        # Quality check

### Refactoring Candidates
1. reveal 'ast://src?lines>100'              # Long functions
2. reveal 'ast://src?complexity>8'           # Complex functions
3. Intersect results → prioritize refactoring

## Pro Tips
- Use --head/--range for large files (token efficient)
- --format=json enables pipeline composition
- --check integrates 24 quality rules (flake8 subset)
- ast:// queries support >, <, >=, <=, == operators
- Multiple filters combine with & (AND logic)

## Related Commands
reveal help://              # List all help topics
reveal help://ast           # AST query deep dive
reveal --list-supported     # Supported file types

This is what the agent sees. Fresh, comprehensive, with real examples.

Real-World Evidence: This Works

Case Study 1: TIA Command Discovery

Before --help discipline:
- Agent tried wrong commands: tia session find (doesn't exist)
- Guessed wrong syntax: reveal ast://src?complex>10 (should be complexity>10)
- Missed features: tia project show <name> (never discovered)
- Token waste: Trial and error across multiple attempts

After --help requirement:
- Agent checks: tia session --help → sees search subcommand
- Agent reads: reveal help://ast → learns correct operators
- Agent discovers: tia project --help → finds show command
- Token efficient: Gets it right first time

Measured impact: 20-40% token reduction in command-heavy sessions.

Case Study 2: Reveal Evolution (v0.13 → v0.15)

Static prompt approach:

# Agent's knowledge (frozen at v0.13)
reveal app.py              # Only knows basic usage
reveal app.py function     # Only knows extraction

Dynamic help approach:

# Agent requests fresh docs (v0.15)
$ reveal --agent-help-full

# Discovers NEW features (added in v0.14-v0.15):
reveal help://                    # Self-documenting (v0.15)
reveal --agent-help-full          # This command! (v0.15)
reveal 'ast://src?complexity>10'  # AST queries (v0.15)
reveal --check                    # Quality scans (v0.14)
reveal --stdin                    # Pipeline mode (v0.14)

The agent automatically learns new features as tools evolve.

Case Study 3: Scout Research Agent

Scout uses Groqqy (agent framework) with 20+ tool functions. Each tool has complex usage patterns.

Approach: Every tool provides --agent-help equivalent (structured docstrings with examples).

Pattern:

@tool
def reveal_structure(path: str) -> str:
    """
    Token-efficient code structure exploration.

    Examples:
      reveal_structure("src/app.py")
      reveal_structure("src/")

    Advanced:
      Use for: Unknown files, large files, token budget constraints
      Avoid: When you need full implementation details

    Returns: Structure overview (~50 tokens vs ~7500 for full file)
    """
    # Implementation...

Result: Scout's multi-phase research orchestrator completes complex analysis with 75% automation. When a phase fails, reading tool help reveals correct usage patterns.

Why This Matters for Agent Systems

1. Tools Evolve Faster Than Prompts

Software reality: Tools ship updates weekly (bug fixes, features, breaking changes).

Prompt reality: System prompts update quarterly (manual human process).

Gap: Your agent is always operating on stale information unless it can request fresh docs.

2. Prompt Token Budgets Are Precious

Current LLM economics:
- Input: $3-15 per 1M tokens (depending on model)
- Output: $15-75 per 1M tokens

Static approach: 25K tokens of examples in every prompt (regardless of which tools are used).

Dynamic approach: ~200 tokens to request help, ~2K tokens for relevant help.

Savings: 90%+ reduction when only 1-2 tools are used per session.

3. Context-Adaptive Learning

Static examples can't predict use cases.

Example: You ship reveal with basic examples. User wants to:
- Find all functions with cyclomatic complexity > 10
- Filter by lines of code
- Output as JSON for pipeline processing
- Composition with jq

Your static examples don't cover this. But reveal help://ast does, because it's comprehensive documentation designed for discovery.

Agent help enables exploration - not just execution.

4. Self-Documenting Systems Scale

As your tool ecosystem grows:
- 5 tools × 5 examples = 25 examples (manageable in prompt)
- 50 tools × 5 examples = 250 examples (prompt bloat)
- 500 tools × 5 examples = 2500 examples (impossible)

Static prompts don't scale. Dynamic help does.

The Agent Help Standard (SIL Spec)

Requirements for `--agent-help`

All SIL-compliant tools MUST provide:

<tool> --agent-help              # Agent-optimized quick reference
<tool> --agent-help-full         # Comprehensive guide with workflows

Content requirements:
1. Purpose statement (what/why in 1-2 sentences)
2. Basic syntax (minimal invocation pattern)
3. Real examples (3-5 common use cases) ← REQUIRED
4. Advanced examples (2-3 power-user patterns)
5. Workflow examples (task-oriented compositions)
6. Pro tips (gotchas, when to use, when not to use)
7. Related commands (what to use next)

Why examples are REQUIRED:
- Syntax alone is ambiguous: tool <path> [options] (what's valid?)
- Examples disambiguate: tool src/ --recursive --format=json
- Workflows show composition: git diff | tool --stdin | jq

Implementation Patterns

Command-line tools (Bash/Python):

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--agent-help', action='store_true',
                    help='Show agent-optimized help')
parser.add_argument('--agent-help-full', action='store_true',
                    help='Show comprehensive agent guide')

if args.agent_help:
    print(load_agent_help_quick())
    sys.exit(0)

if args.agent_help_full:
    print(load_agent_help_full())
    sys.exit(0)

Self-documenting systems (reveal's approach):

# Help as a first-class URI scheme
tool help://                    # List all topics
tool help://topic               # Specific topic deep-dive
tool help://adapters            # Category overview

Structured tool definitions (Groqqy/agent frameworks):

@tool(
    name="reveal_structure",
    description="Token-efficient code exploration",
    agent_help="""
    Purpose: See code structure before reading full file

    Examples:
      reveal_structure("app.py")        # Basic usage
      reveal_structure("src/")          # Directory scan

    Use when: Unknown file, large file, token budget tight
    Avoid when: Need full implementation details
    """
)
def reveal_structure(path: str) -> str:
    # Implementation

Adoption Checklist

If you're building agent systems, adopt this pattern:

For Tool Developers

[ ] Add --agent-help flag to all tools
[ ] Include 5+ real examples (not just syntax)
[ ] Add workflow examples (composition patterns)
[ ] Document pro tips (gotchas, edge cases)
[ ] Keep help fresh (update with features)

For Agent Developers

[ ] Update system prompt: "Request --agent-help before using unfamiliar tools"
[ ] Remove static examples (or minimize to core 3-5 tools)
[ ] Measure token savings (compare before/after)
[ ] Track help request patterns (which tools need better docs?)
[ ] Iterate on prompt clarity ("always check help" vs "read docs first")

For LLM Providers

[ ] Add --agent-help to model documentation standards
[ ] Provide tool developers with template/spec
[ ] Measure help request rates (good metric for agentic usage)
[ ] Optimize tokenization for help output (structured content)

Measuring Success

How do you know this is working?

Quantitative Metrics

Token efficiency:

Token_savings = (Static_prompt_tokens - Dynamic_help_tokens) / Static_prompt_tokens

Example:
Static: 25,000 tokens (50 tools × 500 tokens each)
Dynamic: 2,500 tokens (5 help requests × 500 tokens)
Savings: 90%

Success rate:

Tool_usage_success = Correct_invocations / Total_invocations

Before --agent-help: 65% (lots of trial-and-error)
After --agent-help: 92% (gets it right first time)

Feature discovery:

Feature_utilization = Advanced_features_used / Advanced_features_available

Static prompt: 20% (only features with examples)
Dynamic help: 75% (discovers through exploration)

Qualitative Indicators

Good signs:
- Agent requests help before first use ✅
- Agent discovers advanced features (not just basic examples) ✅
- Agent composes tools in novel ways (learns from workflow examples) ✅
- Tool updates automatically propagate to agent behavior ✅

Bad signs:
- Agent tries commands without checking help ❌
- Agent guesses syntax and fails ❌
- Agent never discovers advanced features ❌
- Agent behavior doesn't change when tools update ❌

Common Objections (And Rebuttals)

"But calling --help adds latency!"

Reality check:
- Help request: ~100ms (local command)
- LLM round-trip: 500-2000ms (network + generation)
- Trial-and-error (no help): 3-5 round-trips = 1.5-10 seconds

Math: 100ms upfront << 1.5-10 seconds of guessing wrong.

Also: Cache help output (tools don't change mid-session).

"My static examples are really good!"

That's great! But:
- How often do you update them? (Tools change weekly, prompts change quarterly)
- How comprehensive are they? (Can't cover every use case in 5 examples)
- What's the token cost? (25K static vs 2K dynamic)
- What happens when you add Tool #51? (Prompt bloat)

Static examples are great for the 3-5 most critical tools. Everything else should be dynamic.

"Agents should just figure it out"

This is like saying: "Developers should just figure out APIs without documentation."

Would you use a library with no docs? No examples? No API reference?

Tools without help = unusable for agents.

"Won't agents abuse help requests?"

Possible, but unlikely if prompt is clear:
- "Request --agent-help BEFORE FIRST USE of a tool"
- "Cache help output - don't request again in same session"
- "Only request help if unfamiliar or syntax unclear"

In practice: Agents are conservative (token-conscious). They request help once per tool, then cache it.

The Future: Self-Documenting Everything

Imagine a world where:

Every CLI tool has --agent-help:

git --agent-help
docker --agent-help
kubectl --agent-help
npm --agent-help

Every API has agent-friendly docs:

curl api.example.com/agent-docs

Every LLM tool has structured help:

@tool(agent_help="...")
def my_function():
    pass

Agents would:
- Discover tools through exploration (not static lists)
- Learn usage patterns on-demand (not preloaded examples)
- Adapt to tool updates automatically (fresh docs every time)
- Compose tools creatively (workflow examples inspire novel combinations)

This is the vision: Self-documenting infrastructure where agents learn through exploration, not memorization.

Call to Action

If you're building tools for agents:
1. Add --agent-help to your tools TODAY
2. Include real examples (not just syntax)
3. Keep it fresh (update with every release)

If you're building agent systems:
1. Update your system prompt: "Request help before using tools"
2. Remove static examples (or minimize to core tools)
3. Measure token savings and success rates

If you're an LLM researcher:
1. Study the dynamic help pattern (this is multi-shot learning for agents)
2. Build benchmarks comparing static vs dynamic documentation
3. Contribute to agent help standards

Conclusion: Knowledge On-Demand

The insight:
Static prompts are one-shot learning. Dynamic documentation is multi-shot learning.

The pattern:
Teach agents to request --agent-help before using tools.

The evidence:
20-90% token savings, higher success rates, automatic feature discovery.

The future:
Self-documenting infrastructure where agents learn through exploration.

This changes everything.

Static prompts were the right solution in 2022 when we had 4K context windows and no tool calling.

In 2025, with 200K+ context windows and native tool support, dynamic documentation is obviously better.

It's time to move from one-shot to multi-shot agent learning.

Author: Scott Senatore
Organization: Semantic Infrastructure Lab
Contact: [TBD]
License: CC BY 4.0

Related Work:
- Semantic Feedback Loops (SIL canonical doc)
- Multi-Agent Protocol Principles (SIL canonical doc)
- Reveal --agent-help implementation (reference implementation)
- TIA command help system (production deployment)

Appendix: Template for Agent Help

Use this template for your tools:

## <Tool Name> - Agent Help

### Purpose
[1-2 sentence description of what this tool does and why it exists]

### Basic Usage
<tool> <required_args> [optional_flags]

### Examples

#### Common Use Cases
<tool> example1              # Description
<tool> example2 --flag       # Description
<tool> example3 input.txt    # Description

#### Advanced Patterns
<tool> complex_example --advanced --flags=value
<tool> pipeline | another_tool | third_tool

#### Error Handling
<tool> --validate input      # Check before processing
<tool> --dry-run             # Preview without executing

### Workflows

#### Task: [Common Task Name]
1. <tool> step1
2. <tool> step2
3. <tool> step3
Result: [What you achieve]

#### Task: [Another Common Task]
1. <tool> different_approach
2. <tool> next_step
Result: [What you achieve]

### Pro Tips
- Use FLAG when CONDITION (saves time/tokens/complexity)
- Avoid PATTERN in SITUATION (common mistake)
- Combine with TOOL for BENEFIT (composition pattern)
- Check OUTPUT for SIGNAL (debugging tip)

### Related Commands
<related_tool1> - [When to use instead]
<related_tool2> - [When to use after]
<related_tool3> - [When to use with]

### Version
This help is for <tool> v<version>
Updated: <date>

Fill in the template. Ship with your tool. Change the game.