snodo — AI-SDLC Protocol Engine

The problem

AI agents ship code.
Nobody specifies who approves what.

Security vulnerabilities

Analysis of AI-generated code finds consistent security weaknesses across Python and JavaScript — persistent across model generations, not a temporary capability gap.

29.5% of Copilot Python snippets contained vulnerabilities

Silent failures

Newer models produce code that fails to perform as intended but runs without syntax errors or obvious crashes. Hidden costs accumulate through delayed discovery.

A process problem, not a capability problem

No audit trail

Autonomous code generation introduces audit complications that go beyond code quality into regulatory territory. What decisions were made? What validation occurred?

Structured records that most systems don't provide

Self-review limitation

Single agents reviewing their own output validate against the same flawed model that produced the implementation. Independent review requires structural separation.

Separation of duties — enforced, not requested

The specification

Policy declared in language.
Mechanism enforces it structurally.

M

Modes

Operational stages with defined tool sets. Producer cannot merge; reviewer cannot edit. WF1 enforces disjoint capabilities at the MCP boundary — not by convention, at the infrastructure level.
V

Validators

Independent evaluators with orthogonal criteria — security, architecture, quality. Each assesses without seeing others' verdicts. Pre- and post-execute phases declared in the protocol.
D

Disagreement policy

Unanimous, majority, quorum, or any. Turns variable validator outputs into deterministic decisions: proceed, escalate, or halt. Any blocker is non-overridable — INV3.
T

Validation tokens

JWT-backed single-use credentials issued on validator quorum. MCP tools require a valid token for any mutating operation. Validation cannot be skipped, drifted, or bypassed.

team-protocol.yml

# 3-mode team protocol — majority policy name: team version: "1.0" modes: producer: tools: [edit, dispatch, test] validators: [security, architecture, quality] transitions: [reviewer] reviewer: tools: [review, approve, merge] validators: [coverage, protocol_adherence] transitions: [producer, planner] policy: disagreement: majority escalation: halt tokens: ttl: 600 global_constraints: - negative(producer calls merge) - positive(tests exist => dispatch)

Empirical characterisation

Measured. Not asserted.

~1_%

Governance overhead

Full governed task path vs. representative LLM inference latency. Structural enforcement is essentially free.

5.6_×

Fewer failures · 5 stages

At a 5-stage pipeline, behavioural compliance fails 45% of the time as corrupted state compounds. Structural enforcement: 8%.

>10_×

Fewer failures · at the limit

The gap widens past an order of magnitude as validators diversify. Even at 20 stages — where behavioural failure hits 91% — structural enforcement holds at 32%.

Reference configuration

The 2+N team pattern

Producer mode

HI-CTRL (human) orchestrator agent coder ×N validator ×N

tools: edit, dispatch, test, validate

↓ produces pull request ↓

Reviewer mode

HI-CTRL (human) review agent specialty reviewers ×N

tools: review, approve, reject, merge

infrastructure boundary — WF1 enforced — disjoint tool sets

Separation of duties, enforced structurally

Two human-in-control roles plus N specialized agents. Producer mode cannot merge; reviewer mode cannot edit. The minimum of two humans derives from Separation of Duties: a single human cannot independently validate their own work, no matter how many agents are on the team.

The language places no upper bound on modes. Enterprise teams typically add planning, deployment, and operations modes. The 2+N pattern is the minimum viable structure.

Producer-reviewer tool separation prevents self-approval (WF1)
Validator quorum with orthogonal criteria — validators cannot see each other's verdicts before deciding
Non-overridable blockers halt cascade propagation before corrupted state compounds (INV3)
Sessions checkpoint across context resets — decisions preserved, completed tasks not rerun (INV5)

Research

Formal specification.
Empirical characterisation.

ACM TOSEM · Journal

Specifying AI-SDLC Processes: A Protocol Language for Human-Agent Boundaries

We propose a domain-specific language for specifying AI-SDLC processes, with formal abstract syntax, well-formedness conditions, operational semantics, and enforcement invariants. The language distinguishes policy (declared intent) from mechanism (structural enforcement), enabling implementations to bound process non-determinism through validation tokens and capability boundaries. Three results follow: structural enforcement bounds system failure rates at a weighted product of agent and validator rates; the 2+N team pattern formalizes Separation of Duties for AI-SDLC; and Kleene closure of orchestration loops and reflexive protocol-adherence validation arise as emergent properties of the design. Simulation studies characterise disagreement-policy trade-offs, governance overhead, the failure-rate model, and Byzantine robustness.

Under review ACM Trans. Software Engineering & Methodology Preprint: arXiv (forthcoming)

Open source · built to be trusted

Validated like infrastructure,
not a prototype.

1,477 tests

95% coverage

Three layers: unit, integration, and end-to-end CLI journeys driven through real subprocess invocation.

130,000 examples

0 invariant violations

Property-based testing of the core invariants — token unforgeability, audit-chain integrity, mode separation, policy halt — against randomized inputs.

3 templates

and growing

Ship-ready reference protocols — solo, team, and 2+N — all expressed in the same DSL, with the library growing toward ~10 configurations.

Structural enforcement, not good intentions

Every invariant in the specification maps to a concrete enforcement mechanism in the implementation. The protocol doesn't ask agents to behave — the infrastructure makes the non-compliant path impossible.

The system is built under its own rules: it extends its own codebase through the same 2+N protocol it defines — dogfooded end to end.

AGPLv3, developed in the open. Install from PyPI; the full source, protocol templates, and test suite are public.

AGPLv3 PyPI · pip install snodo CI · Python 3.12 / 3.13

Invariant → enforcement mechanism

INV1 · token integrity

JWT HMAC · single-use · TTL

INV2 · capability boundary

disjoint tool sets at MCP boundary

INV4 · audit completeness

hash-chained audit log

INV5 · session resumability

file-based checkpoint

WF1 · tool disjointness

compile-time protocol verifier

Governance for
human-agent
software teams

AI agents ship code.
Nobody specifies who approves what.

Policy declared in language.
Mechanism enforces it structurally.

Measured. Not asserted.

The 2+N team pattern

Separation of duties, enforced structurally

Formal specification.
Empirical characterisation.

Validated like infrastructure,
not a prototype.

Structural enforcement, not good intentions

Define your process.
Enforce it structurally.

Governance forhuman-agentsoftware teams

AI agents ship code.Nobody specifies who approves what.

Policy declared in language.Mechanism enforces it structurally.

Measured. Not asserted.

The 2+N team pattern

Separation of duties, enforced structurally

Formal specification.Empirical characterisation.

Validated like infrastructure,not a prototype.

Structural enforcement, not good intentions

Define your process.Enforce it structurally.

Governance for
human-agent
software teams

AI agents ship code.
Nobody specifies who approves what.

Policy declared in language.
Mechanism enforces it structurally.

Formal specification.
Empirical characterisation.

Validated like infrastructure,
not a prototype.

Define your process.
Enforce it structurally.