The Agency Paradox: Who is the Agent in Command?
Version: 1.0
Date: December 2025
Author: Stephen Sweeney
Type: Manifesto
Audience: Engineering Leaders, AI Teams, Safety-Critical Developers
Introduction
As AI systems begin generating increasingly large portions of the engineering workload, teams risk losing architectural coherence, safety, and long-term maintainability. This manifesto defines the discipline required to keep humans firmly in command of AI-generated systems. It reframes the role of the engineer from coder to steward—one who governs autonomy with intention, authority, and structured oversight.
The accelerating shift toward multi-agent development workflows in 2024–2025 makes this question of command urgent, not theoretical.
The Central Question
In the era of autonomous software, the term agent has become ambiguous. We now call automated systems “AI agents,” yet rarely ask: Who is truly in command?
If the AI becomes the de facto “agent in command,” the human is reduced to a passenger, carried by the system rather than directing it.
My role is to reverse that dynamic.
Autonomy without oversight is not intelligence—it is risk.
Reframing the Engineering Role
From Coder to Steward
The traditional software engineer writes code, reviews pull requests, and maintains systems. As AI takes over more of the implementation work, a new role emerges:
The Engineering Steward
I provide engineering governance and safety oversight in AI-driven software development. My responsibility is not to “babysit” the AI, but to enforce professional engineering standards the AI cannot self-impose.
Pilot in Command
I operate as the Pilot in Command:
- The AI may handle the labor
- I retain full situational awareness
- I maintain authority over all decisions
- I bear responsibility for architecture, safety, and scope
My primary responsibility is risk management, not speed.
The AI executes. I am accountable.
This accountability mirrors the principles of safety-critical engineering, where automation may assist but never replace the human responsible for system integrity and operational safety.
Why These Pillars Exist
Before defining the disciplines of human command, it is important to recognize the primary failure modes of autonomous code generation.
Common Failure Modes
Without structured oversight, AI systems tend to introduce:
-
Silent Regressions
- Breaking changes without test coverage
- Subtle bugs that pass CI but fail in production
- Edge cases ignored during refactoring
-
Architectural Drift
- Inconsistent patterns across codebase
- Ad-hoc solutions to recurring problems
- Coupling increases over time
-
Dependency Sprawl
- Unnecessary libraries added
- Version conflicts introduced
- Security vulnerabilities imported
-
Non-Deterministic Behavior
- Race conditions in concurrent code
- Untested error paths
- Inconsistent state management
-
Scope Overreach
- Changes beyond specified task
- Refactoring unrelated code
- “Improvements” that break assumptions
These failure modes compound rapidly at AI speed, making human stewardship not optional, but essential.
The Three Pillars of Human Command
To prevent the incoherent abstractions, silent regressions, and unbounded changes common in AI-generated code, I uphold three non-negotiable pillars of stewardship.
Pillar 1: Sovereignty Over Scope
(The Command Discipline)
Without explicit sovereignty, agentic workflows quickly collapse into improvisational changes, architectural drift, and regression-prone refactors. I maintain command by strictly defining the boundaries of execution.
Practices
Atomic Definitions
- Break work into small, atomic tasks suitable for clean, reviewable PRs
- Each task has clear input, output, and success criteria
- No task should touch more than one architectural layer
Engineering Contracts
- Treat prompts as binding engineering contracts—precise, testable, and non-optional
- Not casual instructions, but specifications
- Include acceptance criteria, constraints, and examples
Intentional Evolution
- Ensure the system evolves according to a deliberate plan
- Not AI-driven improvisation
- Architecture decisions are human decisions
Anti-Patterns to Prevent
- ❌ “Improve the authentication system” (too vague)
- ❌ “Fix all the bugs in the payment flow” (unbounded)
- ❌ “Refactor for better performance” (no criteria)
✅ “Add rate limiting to login endpoint: max 5 attempts per minute per IP, return 429 with Retry-After header”
Pillar 2: The Verification Loop
(The Process Discipline)
Speed without verification is merely accelerated technical debt. I convert AI-generated speed into sustainable velocity by enforcing a rigid loop of:
Spec → Test → Implementation → Verification → Diff Review
The Loop in Detail
1. Specification
- Written acceptance criteria
- Edge cases identified
- Success metrics defined
- Failure modes considered
2. Test-Driven Authority
- Require unit tests before implementation
- Edge-case coverage documented
- CI checks must pass
- Integration tests for cross-boundary changes
3. Implementation Review
- Verify implementation matches spec
- Check for scope creep
- Validate error handling
- Confirm no hidden dependencies
4. Traceability
- Demand explainability for every architectural modification
- Maintain readable audit trail for critical systems
- Document decision rationale
- Link to requirements
5. Safe Reversibility
- Ensure all changes can be rolled back cleanly
- Feature flags for risky changes
- Database migrations are reversible
- No breaking changes without deprecation path
Verification Checklist
Every AI-generated change must answer:
- Does this match the specification exactly?
- Are there tests for all paths (including errors)?
- Can this be deployed independently?
- Can this be rolled back safely?
- Are breaking changes documented?
- Are dependencies justified and minimal?
- Is error handling comprehensive?
- Are performance implications understood?
Pillar 3: Risk Management
(The Safety Discipline)
Modern agent systems accelerate not only output but also failure. Risk grows at the rate of automation unless constrained by human discipline.
Guarding the Boundaries
Interface Protection
- Public APIs must remain stable
- Breaking changes require explicit approval
- Deprecation follows defined timeline
- Versioning strategy enforced
Invariant Preservation
- System invariants documented
- Validation enforced at boundaries
- State machines remain consistent
- Constraints are type-enforced where possible
Module Borders
- Clear ownership of modules
- Cross-module changes require justification
- Abstraction boundaries respected
- Dependencies flow in one direction
Stopping Deviation
Situational Awareness
- Continuous monitoring of AI output quality
- Pattern recognition for drift
- Early intervention before compounding
- Feedback loops to improve prompts
Intervention Criteria
- Stop immediately if scope exceeded
- Pause if architectural principles violated
- Redirect if safety implications unclear
- Abort if rollback path not clear
Standards Elevation
Culture of Discipline
- Explainability is non-negotiable
- Bounded autonomy by design
- Disciplined evolution over rapid iteration
- Quality gates that cannot be bypassed
Continuous Improvement
- Learn from near-misses
- Document failure modes
- Refine specifications based on outcomes
- Share lessons across team
The Philosophy
This is not traditional engineering.
This is not “prompt engineering.”
This is Human-in-Command Software Architecture.
What This Means
Traditional Engineering:
- Human writes code
- Human reviews code
- Human maintains code
Prompt Engineering:
- Human describes desired outcome
- AI generates code
- Human accepts or rejects
Human-in-Command:
- Human defines authority boundaries
- AI proposes within constraints
- Human validates and authorizes
- System remains deterministic and auditable
The Distinction
In traditional engineering, the human is the worker.
In prompt engineering, the human is the requester.
In Human-in-Command, the human is the governor.
The Outcome
Under Human-in-Command discipline, AI becomes a force multiplier rather than a source of chaos. The result is a development environment where speed and safety coexist, architecture remains intentional, and agents operate within a framework that preserves clarity, reliability, and long-term health of the system.
Measurable Benefits
Quality Metrics:
- Reduced regression rate
- Faster incident resolution
- Lower technical debt accumulation
- Higher test coverage
Velocity Metrics:
- Sustainable pace over time
- Predictable delivery
- Reduced rework
- Fewer rollbacks
Architecture Metrics:
- Consistent patterns
- Stable interfaces
- Manageable dependencies
- Clear ownership
Team Metrics:
- Higher confidence in changes
- Reduced cognitive load
- Better knowledge retention
- Improved onboarding
Implementation: A Practical Framework
For Individual Contributors
-
Before engaging AI:
- Write clear specification
- Define acceptance criteria
- Identify constraints
- List edge cases
-
During AI generation:
- Monitor scope adherence
- Validate architectural consistency
- Check for anti-patterns
- Verify test coverage
-
After AI completes:
- Review diff comprehensively
- Run full test suite
- Check reversibility
- Document decisions
For Team Leads
-
Establish standards:
- Define architectural principles
- Document patterns
- Create templates
- Set quality gates
-
Enable governance:
- Review processes
- Approval workflows
- Escalation paths
- Feedback mechanisms
-
Measure and improve:
- Track metrics
- Analyze failures
- Refine processes
- Share learnings
For Organizations
-
Cultural shift:
- Value reliability over speed
- Reward discipline
- Celebrate prevented failures
- Build trust through consistency
-
Investment areas:
- Tooling for verification
- Training on governance
- Time for review
- Infrastructure for safety
-
Long-term strategy:
- Architectural evolution
- Technical debt management
- Capability development
- Risk mitigation
Looking Forward
The principles outlined here will be explored in depth in a forthcoming series of articles that expands Human-in-Command Software Architecture into a practical, disciplined engineering methodology.
Upcoming Topics
- Atomic Task Definition: How to scope AI work for maximum safety
- The Verification Loop in Practice: Real examples and tooling
- Boundary Detection: Identifying when AI drift begins
- Architectural Authority: Maintaining coherence at scale
- Incident Case Studies: Learning from AI-induced failures
- Team Adoption: Rolling out governance practices
Conclusion
Without stewardship, AI becomes a liability—introducing regressions, degrading architecture, and obscuring intent.
With stewardship, AI becomes a disciplined force multiplier capable of accelerating delivery without compromising safety.
I ensure the AI builds the right thing, the safe thing, and the maintainable thing—every time.
The question is not whether AI will generate more code. It will.
The question is whether that code will be governed by discipline or chaos.
I choose discipline.
I am the agent in command.
Related Reading
- SwiftVector Whitepaper - Architectural patterns for deterministic AI
- Swift on the Edge - Why Swift for edge AI
- Agent In Command - Project website
Author: Stephen Sweeney
Role: Principal Architect, Autonomous Systems
Contact: stephen@agentincommand.ai
License: CC BY 4.0