The Agency Paradox: Who is the Agent in Command?

Version: 1.0
Date: December 2025
Author: Stephen Sweeney
Type: Manifesto
Audience: Engineering Leaders, AI Teams, Safety-Critical Developers

Introduction

As AI systems begin generating increasingly large portions of the engineering workload, teams risk losing architectural coherence, safety, and long-term maintainability. This manifesto defines the discipline required to keep humans firmly in command of AI-generated systems. It reframes the role of the engineer from coder to steward—one who governs autonomy with intention, authority, and structured oversight.

The accelerating shift toward multi-agent development workflows in 2024–2025 makes this question of command urgent, not theoretical.

The Central Question

In the era of autonomous software, the term agent has become ambiguous. We now call automated systems “AI agents,” yet rarely ask: Who is truly in command?

If the AI becomes the de facto “agent in command,” the human is reduced to a passenger, carried by the system rather than directing it.

My role is to reverse that dynamic.

Autonomy without oversight is not intelligence—it is risk.

Reframing the Engineering Role

From Coder to Steward

The traditional software engineer writes code, reviews pull requests, and maintains systems. As AI takes over more of the implementation work, a new role emerges:

The Engineering Steward

I provide engineering governance and safety oversight in AI-driven software development. My responsibility is not to “babysit” the AI, but to enforce professional engineering standards the AI cannot self-impose.

Pilot in Command

I operate as the Pilot in Command:

The AI may handle the labor
I retain full situational awareness
I maintain authority over all decisions
I bear responsibility for architecture, safety, and scope

My primary responsibility is risk management, not speed.

The AI executes. I am accountable.

This accountability mirrors the principles of safety-critical engineering, where automation may assist but never replace the human responsible for system integrity and operational safety.

Why These Pillars Exist

Before defining the disciplines of human command, it is important to recognize the primary failure modes of autonomous code generation.

Common Failure Modes

Without structured oversight, AI systems tend to introduce:

Silent Regressions
- Breaking changes without test coverage
- Subtle bugs that pass CI but fail in production
- Edge cases ignored during refactoring
Architectural Drift
- Inconsistent patterns across codebase
- Ad-hoc solutions to recurring problems
- Coupling increases over time
Dependency Sprawl
- Unnecessary libraries added
- Version conflicts introduced
- Security vulnerabilities imported
Non-Deterministic Behavior
- Race conditions in concurrent code
- Untested error paths
- Inconsistent state management
Scope Overreach
- Changes beyond specified task
- Refactoring unrelated code
- “Improvements” that break assumptions

These failure modes compound rapidly at AI speed, making human stewardship not optional, but essential.

The Three Pillars of Human Command

To prevent the incoherent abstractions, silent regressions, and unbounded changes common in AI-generated code, I uphold three non-negotiable pillars of stewardship.

Pillar 1: Sovereignty Over Scope

(The Command Discipline)

Without explicit sovereignty, agentic workflows quickly collapse into improvisational changes, architectural drift, and regression-prone refactors. I maintain command by strictly defining the boundaries of execution.

Practices

Atomic Definitions

Break work into small, atomic tasks suitable for clean, reviewable PRs
Each task has clear input, output, and success criteria
No task should touch more than one architectural layer

Engineering Contracts

Treat prompts as binding engineering contracts—precise, testable, and non-optional
Not casual instructions, but specifications
Include acceptance criteria, constraints, and examples

Intentional Evolution

Ensure the system evolves according to a deliberate plan
Not AI-driven improvisation
Architecture decisions are human decisions

Anti-Patterns to Prevent

❌ “Improve the authentication system” (too vague)
❌ “Fix all the bugs in the payment flow” (unbounded)
❌ “Refactor for better performance” (no criteria)

✅ “Add rate limiting to login endpoint: max 5 attempts per minute per IP, return 429 with Retry-After header”

Pillar 2: The Verification Loop

(The Process Discipline)

Speed without verification is merely accelerated technical debt. I convert AI-generated speed into sustainable velocity by enforcing a rigid loop of:

Spec → Test → Implementation → Verification → Diff Review

The Loop in Detail

1. Specification

Written acceptance criteria
Edge cases identified
Success metrics defined
Failure modes considered

2. Test-Driven Authority

Require unit tests before implementation
Edge-case coverage documented
CI checks must pass
Integration tests for cross-boundary changes

3. Implementation Review

Verify implementation matches spec
Check for scope creep
Validate error handling
Confirm no hidden dependencies

4. Traceability

Demand explainability for every architectural modification
Maintain readable audit trail for critical systems
Document decision rationale
Link to requirements

5. Safe Reversibility

Ensure all changes can be rolled back cleanly
Feature flags for risky changes
Database migrations are reversible
No breaking changes without deprecation path

Verification Checklist

Every AI-generated change must answer:

Does this match the specification exactly?
Are there tests for all paths (including errors)?
Can this be deployed independently?
Can this be rolled back safely?
Are breaking changes documented?
Are dependencies justified and minimal?
Is error handling comprehensive?
Are performance implications understood?

Pillar 3: Risk Management

(The Safety Discipline)

Modern agent systems accelerate not only output but also failure. Risk grows at the rate of automation unless constrained by human discipline.

Guarding the Boundaries

Interface Protection

Public APIs must remain stable
Breaking changes require explicit approval
Deprecation follows defined timeline
Versioning strategy enforced

Invariant Preservation

System invariants documented
Validation enforced at boundaries
State machines remain consistent
Constraints are type-enforced where possible

Module Borders

Clear ownership of modules
Cross-module changes require justification
Abstraction boundaries respected
Dependencies flow in one direction

Stopping Deviation

Situational Awareness

Continuous monitoring of AI output quality
Pattern recognition for drift
Early intervention before compounding
Feedback loops to improve prompts

Intervention Criteria

Stop immediately if scope exceeded
Pause if architectural principles violated
Redirect if safety implications unclear
Abort if rollback path not clear

Standards Elevation

Culture of Discipline

Explainability is non-negotiable
Bounded autonomy by design
Disciplined evolution over rapid iteration
Quality gates that cannot be bypassed

Continuous Improvement

Learn from near-misses
Document failure modes
Refine specifications based on outcomes
Share lessons across team

The Philosophy

This is not traditional engineering.
This is not “prompt engineering.”
This is Human-in-Command Software Architecture.

What This Means

Traditional Engineering:

Human writes code
Human reviews code
Human maintains code

Prompt Engineering:

Human describes desired outcome
AI generates code
Human accepts or rejects

Human-in-Command:

Human defines authority boundaries
AI proposes within constraints
Human validates and authorizes
System remains deterministic and auditable

The Distinction

In traditional engineering, the human is the worker.
In prompt engineering, the human is the requester.
In Human-in-Command, the human is the governor.

The Outcome

Under Human-in-Command discipline, AI becomes a force multiplier rather than a source of chaos. The result is a development environment where speed and safety coexist, architecture remains intentional, and agents operate within a framework that preserves clarity, reliability, and long-term health of the system.

Measurable Benefits

Quality Metrics:

Reduced regression rate
Faster incident resolution
Lower technical debt accumulation
Higher test coverage

Velocity Metrics:

Sustainable pace over time
Predictable delivery
Reduced rework
Fewer rollbacks

Architecture Metrics:

Consistent patterns
Stable interfaces
Manageable dependencies
Clear ownership

Team Metrics:

Higher confidence in changes
Reduced cognitive load
Better knowledge retention
Improved onboarding

Implementation: A Practical Framework

For Individual Contributors

Before engaging AI:
- Write clear specification
- Define acceptance criteria
- Identify constraints
- List edge cases
During AI generation:
- Monitor scope adherence
- Validate architectural consistency
- Check for anti-patterns
- Verify test coverage
After AI completes:
- Review diff comprehensively
- Run full test suite
- Check reversibility
- Document decisions

For Team Leads

Establish standards:
- Define architectural principles
- Document patterns
- Create templates
- Set quality gates
Enable governance:
- Review processes
- Approval workflows
- Escalation paths
- Feedback mechanisms
Measure and improve:
- Track metrics
- Analyze failures
- Refine processes
- Share learnings

For Organizations

Cultural shift:
- Value reliability over speed
- Reward discipline
- Celebrate prevented failures
- Build trust through consistency
Investment areas:
- Tooling for verification
- Training on governance
- Time for review
- Infrastructure for safety
Long-term strategy:
- Architectural evolution
- Technical debt management
- Capability development
- Risk mitigation

Looking Forward

The principles outlined here will be explored in depth in a forthcoming series of articles that expands Human-in-Command Software Architecture into a practical, disciplined engineering methodology.

Upcoming Topics

Atomic Task Definition: How to scope AI work for maximum safety
The Verification Loop in Practice: Real examples and tooling
Boundary Detection: Identifying when AI drift begins
Architectural Authority: Maintaining coherence at scale
Incident Case Studies: Learning from AI-induced failures
Team Adoption: Rolling out governance practices

Conclusion

Without stewardship, AI becomes a liability—introducing regressions, degrading architecture, and obscuring intent.

With stewardship, AI becomes a disciplined force multiplier capable of accelerating delivery without compromising safety.

I ensure the AI builds the right thing, the safe thing, and the maintainable thing—every time.

The question is not whether AI will generate more code. It will.

The question is whether that code will be governed by discipline or chaos.

I choose discipline.

I am the agent in command.

SwiftVector Whitepaper - Architectural patterns for deterministic AI
Swift on the Edge - Why Swift for edge AI
Agent In Command - Project website

Author: Stephen Sweeney
Role: Principal Architect, Autonomous Systems
Contact: stephen@agentincommand.ai
License: CC BY 4.0