AI-SDLC Protocol Engine

Governance for
human-agent
software teams

A specification language for human-agent boundaries. Define modes, validators, and disagreement policies — then enforce them structurally, not by convention.

$ pip install snodo
View on GitHub PyPI →
The problem

AI agents ship code.
Nobody specifies who approves what.

01
Security vulnerabilities

Analysis of AI-generated code finds consistent security weaknesses across Python and JavaScript — persistent across model generations, not a temporary capability gap.

29.5% of Copilot Python snippets contained vulnerabilities
02
Silent failures

Newer models produce code that fails to perform as intended but runs without syntax errors or obvious crashes. Hidden costs accumulate through delayed discovery.

A process problem, not a capability problem
03
No audit trail

Autonomous code generation introduces audit complications that go beyond code quality into regulatory territory. What decisions were made? What validation occurred?

Structured records that most systems don't provide
04
Self-review limitation

Single agents reviewing their own output validate against the same flawed model that produced the implementation. Independent review requires structural separation.

Separation of duties — enforced, not requested
The specification

Policy declared in language.
Mechanism enforces it structurally.

  • M
    Modes

    Operational stages with defined tool sets. Producer cannot merge; reviewer cannot edit. WF1 enforces disjoint capabilities at the MCP boundary — not by convention, at the infrastructure level.

  • V
    Validators

    Independent evaluators with orthogonal criteria — security, architecture, quality. Each assesses without seeing others' verdicts. Pre- and post-execute phases declared in the protocol.

  • D
    Disagreement policy

    Unanimous, majority, quorum, or any. Turns variable validator outputs into deterministic decisions: proceed, escalate, or halt. Any blocker is non-overridable — INV3.

  • T
    Validation tokens

    JWT-backed single-use credentials issued on validator quorum. MCP tools require a valid token for any mutating operation. Validation cannot be skipped, drifted, or bypassed.

team-protocol.yml
# 3-mode team protocol — majority policy name: team version: "1.0" modes: producer: tools: [edit, dispatch, test] validators: [security, architecture, quality] transitions: [reviewer] reviewer: tools: [review, approve, merge] validators: [coverage, protocol_adherence] transitions: [producer, planner] policy: disagreement: majority escalation: halt tokens: ttl: 600 global_constraints: - negative(producer calls merge) - positive(tests exist => dispatch)
Empirical characterisation

Measured. Not asserted.

~1%
Governance overhead

Full governed task path vs. representative LLM inference latency. Structural enforcement is essentially free.

5.6×
Fewer failures · 5 stages

At a 5-stage pipeline, behavioural compliance fails 45% of the time as corrupted state compounds. Structural enforcement: 8%.

>10×
Fewer failures · at the limit

The gap widens past an order of magnitude as validators diversify. Even at 20 stages — where behavioural failure hits 91% — structural enforcement holds at 32%.

Reference configuration

The 2+N team pattern

Producer mode
HI-CTRL (human) orchestrator agent coder ×N validator ×N
tools: edit, dispatch, test, validate
↓   produces pull request   ↓
Reviewer mode
HI-CTRL (human) review agent specialty reviewers ×N
tools: review, approve, reject, merge
infrastructure boundary — WF1 enforced — disjoint tool sets

Separation of duties, enforced structurally

Two human-in-control roles plus N specialized agents. Producer mode cannot merge; reviewer mode cannot edit. The minimum of two humans derives from Separation of Duties: a single human cannot independently validate their own work, no matter how many agents are on the team.

The language places no upper bound on modes. Enterprise teams typically add planning, deployment, and operations modes. The 2+N pattern is the minimum viable structure.

  • Producer-reviewer tool separation prevents self-approval (WF1)
  • Validator quorum with orthogonal criteria — validators cannot see each other's verdicts before deciding
  • Non-overridable blockers halt cascade propagation before corrupted state compounds (INV3)
  • Sessions checkpoint across context resets — decisions preserved, completed tasks not rerun (INV5)
Research

Formal specification.
Empirical characterisation.

ACM TOSEM · Journal
Specifying AI-SDLC Processes: A Protocol Language for Human-Agent Boundaries

We propose a domain-specific language for specifying AI-SDLC processes, with formal abstract syntax, well-formedness conditions, operational semantics, and enforcement invariants. The language distinguishes policy (declared intent) from mechanism (structural enforcement), enabling implementations to bound process non-determinism through validation tokens and capability boundaries. Three results follow: structural enforcement bounds system failure rates at a weighted product of agent and validator rates; the 2+N team pattern formalizes Separation of Duties for AI-SDLC; and Kleene closure of orchestration loops and reflexive protocol-adherence validation arise as emergent properties of the design. Simulation studies characterise disagreement-policy trade-offs, governance overhead, the failure-rate model, and Byzantine robustness.

Under review ACM Trans. Software Engineering & Methodology Preprint: arXiv (forthcoming)
Open source · built to be trusted

Validated like infrastructure,
not a prototype.

1,477 tests
95% coverage

Three layers: unit, integration, and end-to-end CLI journeys driven through real subprocess invocation.

130,000 examples
0 invariant violations

Property-based testing of the core invariants — token unforgeability, audit-chain integrity, mode separation, policy halt — against randomized inputs.

3 templates
and growing

Ship-ready reference protocols — solo, team, and 2+N — all expressed in the same DSL, with the library growing toward ~10 configurations.

Structural enforcement, not good intentions

Every invariant in the specification maps to a concrete enforcement mechanism in the implementation. The protocol doesn't ask agents to behave — the infrastructure makes the non-compliant path impossible.

The system is built under its own rules: it extends its own codebase through the same 2+N protocol it defines — dogfooded end to end.

AGPLv3, developed in the open. Install from PyPI; the full source, protocol templates, and test suite are public.

AGPLv3 PyPI · pip install snodo CI · Python 3.12 / 3.13
Invariant → enforcement mechanism
INV1 · token integrity
JWT HMAC · single-use · TTL
INV2 · capability boundary
disjoint tool sets at MCP boundary
INV4 · audit completeness
hash-chained audit log
INV5 · session resumability
file-based checkpoint
WF1 · tool disjointness
compile-time protocol verifier

Define your process.
Enforce it structurally.

As foundation models commoditize, the durable engineering asset becomes process design. The protocol is institutional memory; models are commodity infrastructure.

$ pip install snodo
Get started → Read the docs