Why the FAA Can’t Certify a Language Model
Governed autonomy for deterministic drone operations
AI may propose. Operators decide. Laws enforce. And every decision can be replayed, audited, and explained.
Flightworks Control is a Swift/SwiftUI Ground Control Station built on the AgentVector Codex—a constitutional model where State is authority, Actions are proposals, and Reducers enforce Laws deterministically. The title says “language model,” but the problem is broader: neural networks, vision classifiers, recommendation engines—any system whose outputs are probabilistic and irreproducible cannot be certified under frameworks that require deterministic, auditable behavior. The claim is not “AI can fly safely.” The claim is narrower and stronger: you can’t certify probabilistic intelligence, but you can certify the deterministic boundary around it.
Authority Failures Are Governance Failures
When software and human judgment conflict, the outcome depends on who holds authority and whether the system makes its behavior legible.
The Boeing 737 MAX accidents—Lion Air 610 on October 29, 2018; Ethiopian Airlines 302 on March 10, 2019—demonstrated authority conflict at its most extreme. The MCAS system repeatedly acted against pilot intent. The system did not provide the operator with sufficient clarity or control to resolve the conflict in time. 346 people died across two crashes before the fleet was grounded.
Flightworks is not a commercial airliner system. It is a drone GCS. But the lesson generalizes: in safety-critical operations, the most dangerous failure mode is unbounded authority—especially when it is opaque.
The design goal is not “more autonomy.” It is bounded authority, deterministic enforcement, operator control, and post-incident explainability.
The Assurance Gap
AI capabilities are accelerating—computer vision, anomaly detection, recommendation engines, sensor fusion. In drone operations, these capabilities reduce workload and improve outcomes. But safety-critical assurance frameworks demand properties that probabilistic systems struggle to provide: reproducible behavior, complete traceability, explicit rules of authority, and deterministic decision paths.
This creates an assurance gap: the AI capabilities most useful in the field are the hardest to justify in rigorous safety or compliance regimes.
Flightworks Control closes that gap by treating AI as advisory compute, never as operational authority.
The AgentVector Codex Model
Flightworks Control implements a Codex-driven jurisdiction system with three layers:
The Codex is the constitutional framework: AgentVector’s deterministic control loop (State → Action → Reducer → New State). Every state mutation in the system passes through this loop. The enforcement kernel is SwiftVector—the reference implementation of AgentVector in Swift.
Laws are domain constraints that must always hold. They are not guidelines or preferences—they are compiled into the binary and enforced by the Reducer. A Law violation is not logged and tolerated; it is rejected before it reaches state.
Jurisdictions are composed sets of Laws governing a mission domain. A jurisdiction inherits everything below it and adds domain-specific constraints on top.
The Core Loop
State is truth. All flight-relevant information is represented in typed, explicit, immutable state: connection status, aircraft telemetry, battery level, geofence boundaries, mission plan, approval queue. There are no hidden variables.
Actions are proposals. Operator commands, telemetry updates, and agent recommendations are all represented as typed Actions. An Action is a request to change state, not a command that changes state directly.
Reducers enforce Laws. A Reducer is the single authority that may transition state. It either applies the action (legal), rejects it (illegal), or routes it to operator approval (high-risk). The Reducer is a pure function: same inputs produce the same outputs, every time.
This is the determinism boundary. Intelligence can be non-deterministic. Authority is deterministic.
FlightLaw: The Universal Safety Kernel
Flightworks Control’s baseline jurisdiction is FlightLaw: a universal safety kernel required for any flight operation. It composes four Laws from the AgentVector Codex:
Law 3 (Observation): Readiness gates, telemetry logging, and audit trail. The system cannot arm without passing pre-flight validation. Every state transition is recorded with a SHA256 hash chain for tamper-evident replay.
Law 4 (Resource): Battery and thermal limits. When battery drops below the reserve threshold or compute hardware exceeds thermal limits, the Law forces state transitions to degraded or halted modes. The operator is notified; the agent cannot override.
Law 7 (Spatial): Geofences, altitude ceilings, no-fly zones, and boundary enforcement. A waypoint that violates a spatial constraint is not logged as a warning—it is rejected by the Reducer. The action is unrepresentable as a valid state transition.
Law 8 (Authority): Risk-tiered approvals and explicit operator override. Low-risk actions may proceed. Medium-risk actions require notification. High-risk actions are suspended until the operator provides explicit authorization.
The architectural move is: prove the safety kernel once, inherit the guarantees everywhere. Higher-level mission jurisdictions extend FlightLaw without duplicating foundational safety logic.
Operator Authority as a First-Class Law
Flightworks assumes the operator is the accountable authority in the loop. The system is engineered to prevent three common autonomy failure modes:
Silent action. No action with safety impact occurs without being representable as a typed Action and recorded in the audit log. If it isn’t an Action, it doesn’t happen.
Persistent coercion. Agents do not nag. They do not override. Recommendations are visible, attributable, and dismissible. The system presents a recommendation once, with its basis. The operator accepts or rejects. No repeated suggestions. No wearing down human judgment through persistence.
Opaque decisions. Every recommendation presents its basis as operator-legible evidence: inputs used, thresholds crossed, confidence levels, and the reason the recommendation was proposed. The operator sees not just what the system recommends, but why.
Operators don’t fight the system. They steer it.
Time-Bounded Determinism
In real-time systems, logical determinism is necessary but insufficient. A function that always produces the same output for the same input is logically deterministic. But if that function sometimes takes 10 milliseconds and sometimes takes 10 seconds, it is temporally non-deterministic. For a ground control station processing live telemetry, this matters enormously.
Consider a geofence violation check. The Reducer validates that a proposed waypoint doesn’t violate airspace boundaries. Logically deterministic—same waypoint, same boundaries, same result. But if the check sometimes completes in 5ms and sometimes in 500ms because of garbage collection or memory pressure, the system’s real-time behavior becomes unpredictable.
Flightworks enforces time-bounded determinism: telemetry processing budgets, Reducer execution budgets, inference budgets, and watchdog monitoring for violations. If any component exceeds its time budget, the system triggers degraded-mode operation. Temporal failures are visible, not silent.
This is a key reason the SwiftVector kernel matters for safety-critical systems. Swift’s Automatic Reference Counting provides deterministic memory management—no stop-the-world garbage collector that could freeze the system at the moment a collision warning needs processing. Languages with garbage collection can’t provide the same temporal guarantees. The collector can pause at any moment. In aviation, “any moment” might be the wrong moment.
If you can’t bound latency, you can’t bound risk.
Two Jurisdictions, One Safety Kernel
The jurisdiction model’s value is clearest when you see how different mission domains inherit the same safety guarantees while adding domain-specific governance. Flightworks implements two mission jurisdictions on top of FlightLaw: ThermalLaw for anomaly detection and SurveyLaw for precision mapping. Both demonstrate the same pattern: stochastic input, deterministic boundary, operator authority.
ThermalLaw: Governing Probabilistic Inference
Thermal anomaly detection is an ideal test case for bounded AI because it is inherently probabilistic at the sensor and model layer. A neural network examining thermal frames does not produce certainties—it produces confidence scores. ThermalLaw treats ML inference as stochastic input and applies deterministic governance at the boundary.
Step 1: Stochastic — ML inference output
A CoreML model processes thermal frames and produces probabilistic outputs:
struct ThermalMLOutput {
let anomalyProbability: Double // 0.0 to 1.0
let peakTemperature: Double
let temperatureDelta: Double
let boundingBox: BoundingBox
}
This output is inherently probabilistic. The model might produce slightly different confidence values on different runs due to floating-point variations, GPU execution order, or quantization effects. ThermalLaw does not try to make the model deterministic—that is not realistic for neural networks.
Step 2: Deterministic — The classification boundary
Instead, ThermalLaw wraps ML output in a pure function with fixed thresholds:
/// Pure function: same ML output → same classification, every time.
/// No learned parameters. No runtime configuration.
static func classify(_ output: ThermalMLOutput) -> ThermalClassification? {
// Fixed threshold — below this, no classification at all
guard output.anomalyProbability >= 0.50 else { return nil }
// Deterministic confidence banding
let confidence: ConfidenceLevel = switch output.anomalyProbability {
case 0.85...: .high
case 0.70..<0.85: .medium
default: .low
}
// Deterministic type assignment from temperature characteristics
let anomalyType: AnomalyType = switch (output.peakTemperature,
output.temperatureDelta) {
case (80..., 30...): .electricalHotspot
case (40..., 10...): .insulationDefect
case (..<30, 5...): .moistureIntrusion
default: .thermalAnomaly
}
return ThermalClassification(
type: anomalyType,
confidence: confidence,
boundingBox: output.boundingBox,
explanation: generateExplanation(output, anomalyType, confidence)
)
}
This classifier is a pure function with fixed, auditable thresholds. Given the same ML output, it always produces the same classification. The thresholds can be reviewed and changed per inspection type, but at runtime they are fixed—not learned, not adaptive, not influenced by prior inference results. A probability of 0.72 always maps to .medium confidence. A peak temperature of 85°C with a 35°C delta always maps to .electricalHotspot. No exceptions. No learned parameters.
Step 3: Authority — Operator confirmation gate
The classification becomes a typed action that flows through the Reducer:
enum ThermalAction {
case anomalyDetected(ThermalClassification)
case anomalyConfirmed(id: UUID)
case anomalyDismissed(id: UUID)
}
The Reducer validates the action against current state, applies business rules (such as workload bounds on maximum candidates per inspection zone), and routes high-severity findings to the operator approval queue (Law 8). Only operator-confirmed anomalies enter the final report.
The ML model can hallucinate. The deterministic classifier bounds its outputs. The Reducer enforces valid state transitions. The operator makes the final call. Each layer adds certainty. None has unchecked authority.
SurveyLaw: Governing Spatial Precision
Where ThermalLaw governs probabilistic inference, SurveyLaw governs probabilistic navigation. A drone flying a photogrammetric survey grid faces real-world stochastic forces—wind gusts, GPS noise, IMU drift, atmospheric turbulence—that push it off its planned flight path. SurveyLaw ensures that the operational result meets engineering-grade precision requirements regardless of environmental disturbance.
The stochastic input
In survey operations, the stochastic element is not a model—it is the physical environment. Wind pushes the aircraft. GPS signals fluctuate. The navigation controller proposes corrective maneuvers. These proposals are the Actions that enter the Reducer.
The deterministic boundary
SurveyLaw extends FlightLaw’s Spatial Law (Law 7) with survey-specific geometric constraints:
/// SurveyLaw Reducer: validates navigation proposals against
/// engineering-grade grid tolerances.
func reduce(state: SurveyState, action: NavigationAction) -> SurveyState {
// Law 7 (Spatial) — survey precision enforcement
let deviation = calculateDeviation(
proposed: action.position,
planned: state.missionGrid
)
guard deviation <= TOLERANCE_GRID_ADHERENCE else {
return state.rejectAction(
reason: "Grid deviation \(deviation)m exceeds \(TOLERANCE_GRID_ADHERENCE)m tolerance"
)
}
// GSD enforcement — altitude must maintain required ground sample distance
let currentGSD = calculateGSD(
altitude: action.altitude,
focalLength: state.camera.focalLength,
sensorWidth: state.camera.sensorWidth,
imageWidth: state.camera.imageWidth
)
guard currentGSD <= state.requiredGSD else {
return state.rejectAction(
reason: "GSD \(currentGSD)cm exceeds required \(state.requiredGSD)cm"
)
}
// RTK fix quality gate
guard action.fixType == .rtkFixed else {
return state.rejectAction(
reason: "RTK fix quality insufficient: \(action.fixType)"
)
}
return state.acceptPosition(action.position, gsd: currentGSD)
}
The pattern is identical to ThermalLaw: stochastic input enters, the Reducer applies deterministic constraints, and only compliant state transitions are permitted. A capture position that violates grid tolerance is rejected—not flagged, not warned about, rejected. The survey deliverable either meets engineering specifications or it doesn’t happen.
The authority gate
Law 8 applies here as it does in ThermalLaw. If the system detects sustained inability to hold grid tolerance—repeated rejections due to wind beyond operational limits, persistent RTK fix degradation—the Reducer escalates to operator authority. The operator sees the pattern of rejected positions, the current deviation metrics, and a recommendation: abort the survey leg, hold position and wait for conditions to improve, or accept degraded accuracy for the affected zone. The system does not decide unilaterally to continue a compromised survey or to abort a mission. The operator decides.
Why this proves jurisdictional composition
ThermalLaw and SurveyLaw govern completely different domains—one manages ML inference confidence, the other manages physical navigation precision—yet both inherit FlightLaw’s safety kernel unchanged. Battery management, geofencing, operator authority, and audit logging work identically whether the mission is thermal inspection or photogrammetric mapping. The safety kernel is proven once and composed everywhere.
This also enables combined jurisdictions. An RTK-enabled thermal inspection activates all three jurisdictions simultaneously: FlightLaw provides battery and geofence guarantees, ThermalLaw governs anomaly detection, and SurveyLaw ensures that every detected anomaly is tagged with RTK-precise GPS coordinates. Each jurisdiction enforces its own Laws independently—they compose without conflict because they govern non-overlapping domains of the state.
Certification and Assurance
Flightworks Control is a research platform and reference implementation. It is not certified software. It has not been audited by a regulatory body. It should not be used for operations where certification is required.
But the architecture is designed to support high-assurance reasoning, and it is worth being precise about what standards apply.
DO-178C is the widely-used guidance for airborne software assurance—software that executes on the aircraft. DO-278A is guidance applied to certain ground-based CNS/ATM systems, specifically safety-relevant ground infrastructure like air traffic management.
A drone GCS is ground software, but it is not automatically within the CNS/ATM scope of DO-278A. For Part 107 operations, no formal software certification standard may apply at all.
So the correct position is: Flightworks is not claiming compliance with DO-178C or DO-278A. It is claiming an architecture that makes assurance possible—deterministic Reducers, explicit authority boundaries, traceable actions, and replayable logs. That distinction preserves credibility while keeping the aspirational direction intact. The path to certification exists, even if traveling it requires significant additional work.
What the architecture provides toward any future assurance effort:
Reproducibility. Given the same inputs—telemetry stream, operator commands, initial state—Flightworks produces identical outputs. This is verified through property-based testing that runs thousands of random scenarios and confirms deterministic behavior.
Traceability. Every state change is attributable to a specific action. Every action is logged with timestamp, source (operator, AI agent, telemetry stream), and the state before and after. Incident reconstruction requires replaying the log, not guessing.
Verifiability. The Reducer is a pure function. Safety invariants—“never arm while inside a geofence violation,” “never exceed maximum altitude”—can be expressed as properties and verified against the Reducer logic.
Bounded AI. The AI components are clearly separated from the authority components. A reviewer doesn’t need to understand neural networks. They need to verify that neural network outputs pass through deterministic gates before affecting system behavior.
Why Open Source
Safety-critical tooling benefits from transparency. External review finds real failure modes. Determinism claims can be verified by others, not merely trusted. Behavior can be replayed and inspected. The AgentVector architecture we claim is provable—the code is available for anyone to test that claim.
For governed autonomy, open source is not a distribution choice. It is an integrity choice.
What Comes Next
Flightworks Control is being built in phases with the thesis as a build constraint: define Laws first, implement Reducers as the authority boundary, add audit and replay early, introduce agents only behind deterministic gates, and prove determinism continuously through tests, replay, and invariant verification.
The long-term goal is not a smarter drone. It is a drone operations stack where you can prove what happened, why it happened, and that it will happen the same way again.
That is the difference between automation and governed autonomy.
Flightworks Control is open source under MIT license. It is a research platform, not certified operational software. Use at your own risk and in compliance with all applicable regulations.
Reach me at stephen@flightworksaerial.com or explore the project at agentincommand.ai.
"In aviation, 'probably safe' is not a certification level."