The AI-Native SDLC

A fundamentally different way to build software. A spec-driven, agent-powered operating model where AI agents own structured execution across every phase, while humans govern the framework, harness, direction, quality, security, and risk.

A north star blueprint for product and R&D teams exploring how software delivery changes when specs, agents, and feedback systems become first-class constructs.

⎔

Spec-Centred

The spec is the system contract. Every agent, test, and deployment traces back to it.

◉

Agent-Orchestrated

Specialised agents own bounded work in each phase, operating within shared context and constraints.

△

Human-Governed

Humans set intent, approve critical transitions, and intervene where confidence or alignment drops.

↻

Feedback-Driven

Not a one-shot build. Signals from production, users, and the market continuously reshape future work.

Industry Overview

Six signals from the frontier of AI-native software delivery, drawn from industry research, platform evolution, and emerging engineering practice.

SIGNAL 01

From Inline Assistance to Delegated Workflows

Modern AI-native development is shifting from chat-in-the-IDE toward issue-to-plan-to-branch-to-PR flows in managed environments. The leading cloud agent models are explicitly built around researching the repository, creating an implementation plan, making changes on a branch, running tests and linters in an ephemeral environment, and then returning a PR with measurable lifecycle metrics.

workflow delegation

SIGNAL 02

Better-Bounded Workflows Beat More Agents

The winning pattern is not "more agents". It is better-bounded workflows. Emerging guidance from model providers and software delivery research is converging: successful teams start with simpler, composable workflows and only add fuller agent autonomy where flexibility is genuinely needed. Context and discipline matter, and complacency with AI-generated code is a real risk. The operating model should sound less like an agent catalogue and more like bounded execution with clear governance.

bounded execution

SIGNAL 03

Context Is a First-Class Engineering Discipline

Context engineering, the curation of what enters the model's context over long-running agent loops, is becoming a recognised discipline. Practices such as shared agent instructions, curated reference applications, and protocol-based access to live dependency and system context are now formalised in leading platforms. Repository instructions, agent configuration files, custom skills, and repository memory are practical tools improving agent performance at scale.

context engineering

SIGNAL 04

Agent Systems Are Evaluated Like Products

Modern practice has moved beyond "did the generated code compile?" Leading model providers now emphasise traces, graders, datasets, multi-turn evaluations, guardrails, and human approvals, because multi-step agents can fail in ways ordinary code tests do not catch. The implication: an AI-native operating model needs an explicit layer for evaluating agent behaviour, not only product behaviour.

agent evaluation

SIGNAL 05

Value Scales Through Platforms, Data, and Governance

Industry research is very clear that AI is an amplifier of the existing system, not a substitute for one. Healthy internal data, a clear organisational AI stance, user-centricity, and high-quality internal platforms are the capabilities that make AI useful at scale. With 90% of organisations already using internal platforms and 76% having dedicated platform teams, "AI-native SDLC" is becoming as much a platform problem as a coding problem.

platform governance

SIGNAL 06

Brownfield Understanding Is Now Forward Engineering (deserves a dedicated deep dive)

AI's role now stretches from understanding legacy codebases to forward engineering. For organisations with mature, complex product estates, this matters enormously. Agents must build understanding of existing domain models, configuration patterns, and legacy seams before they start changing core product behaviour. The operating model must account for comprehension as a prerequisite to generation.

legacy comprehension

The 12-Phase Operating Model

Each phase expands to reveal its agents, tools, and illustrative outputs. Click any phase to explore.

▾

01PHASE

💡Ideation & Problem Discovery

From raw signal to validated problem space

discovery-scout market-radar ideation-spark

The lifecycle begins not with a feature request but with signal ingestion. AI agents continuously scan customer feedback, support tickets, competitor launches, market reports, and internal usage analytics to surface problems worth solving.

The Discovery Agent synthesises these signals into structured problem hypotheses, each with severity score, addressable market size, and alignment to product vision. It generates "problem briefs" that a human product lead reviews and greenlights.

In a living product, this phase never stops. It runs as a background daemon, constantly feeding the backlog with prioritised, evidence-backed problem statements. New feature ideas, customer pain points, and market shifts all flow through this funnel.

Why This Phase Matters

Without continuous signal ingestion, product teams react to the loudest voice instead of the most urgent need. This phase ensures every feature request is validated against real market data, customer pain, and competitive landscape. It replaces gut-feel backlog management with data-driven prioritisation.

Inputs

Customer feedback streams
Support ticket clusters
Usage analytics data
Competitor intelligence
Sales call transcripts

Outputs

Prioritised problem briefs
Investment proposals
Opportunity scoring matrix
Competitive landscape map
Solution hypotheses (ranked)

Human Gate

Human product lead approval
Strategic alignment check
Problem validated with ≥3 data sources

Example Agents & Skills

Discovery Agent discovery-scout

Ingests signals from ServiceNow, Zendesk, G2, Reddit, community blogs, and sales call transcripts. Clusters themes, scores urgency, maps to existing roadmap items or creates new problem briefs.

Claude API Web Search Notion MCP Slack MCP

Market Radar market-radar

Tracks competitor product updates, patent filings, hiring signals, and funding rounds. Generates weekly competitive intelligence briefs with opportunity/threat scoring.

Web Fetch RSS Ingest Crunchbase API

Brainstorm Facilitator ideation-spark

Runs structured ideation sessions: "How Might We" prompts, SCAMPER frameworks, analogy mapping from adjacent industries. Produces ranked solution hypotheses with feasibility estimates.

Claude API Mermaid Charts Figma API

▾

02PHASE

📋Product Requirements (PRD)

Structured, buildable product definition

prd-architect-pro ux-insight dep-graph

The approved problem brief feeds into the PRD Architect, which generates a comprehensive product requirements document: not a vague wish list, but an engineering-ready blueprint.

The PRD covers: user personas, jobs-to-be-done, functional requirements, non-functional requirements (performance, security, accessibility), success metrics, and rollout strategy.

For a living product, the PRD Agent understands the existing product context. It reads the current codebase, existing specs, and feature graph to ensure new requirements are additive and compatible. It flags conflicts with existing features and suggests migration paths.

Why This Phase Matters

A vague PRD creates a game of telephone between product and engineering. The PRD closes what and why ambiguity: who the user is, what they need, why it matters. The Spec closes how ambiguity: data models, API contracts, error taxonomy. Clean separation keeps each phase focused and prevents requirements from leaking into implementation.

Inputs

Approved problem brief
Existing product spec graph
Codebase architecture map
UX research findings

Outputs

Full PRD document
User journey maps
High-fidelity prototypes with dummy data
Roles and permissions matrix
Compliance and regulatory posture
Dependency impact assessment
Open questions register

Human Gate

Product and Engineering sign-off
No unresolved dependency conflicts
Success metrics defined and measurable

Example Agents & Skills

PRD Architect prd-architect-pro

Generates full PRDs from problem briefs. Covers executive summary, problem statement, vision, personas, roles and permissions, functional and non-functional requirements, success metrics, commercial model, compliance posture, scope boundaries, phased rollout plan, and an explicit register of open questions. References existing product specs for consistency.

Claude API GitHub MCP Notion MCP Linear MCP

UX Research Agent ux-insight

Generates user journey maps, identifies UX friction points, benchmarks competitor flows, and produces high-fidelity prototypes with dummy data and accessibility annotations (WCAG AA).

Figma API Hotjar Data Web Fetch

Dependency Mapper dep-graph

Analyses existing product architecture to identify feature dependencies, potential conflicts, and integration touchpoints. Produces dependency graphs and impact assessments.

GitHub MCP Mermaid Charts AST Parser

▾

03PHASE

⚙Technical Specification

The binding contract that agents code against

spec-forge contract-check spec-delta system-cartographer scenario-scribe

This is the keystone phase of AI-native development. The technical spec is not documentation. It's the executable contract that all downstream agents code against, test against, and validate against.

The spec includes: precise API schemas (OpenAPI), database migration scripts, component interface definitions, state machine diagrams, error taxonomy, performance budgets, and acceptance criteria written as machine-parseable assertions.

Critically, specs now include executable examples alongside schemas: canonical business scenarios, exception paths, migration cases, and explicit "never-do-this" boundaries that act as guardrails for downstream coding agents. Constraints like "never mutate this table directly" or "never call this API without a feature flag" are first-class spec content, not tribal knowledge. This makes the spec an operational contract, not a document artefact.

For brownfield products, spec writing depends on deep understanding of the existing system. The Domain Cartographer builds a living map of modules, business entities, configuration variants, and legacy touchpoints before any new spec is written. This is the bridge between "AI understands requirements" and "AI understands the shape of the existing product."

In a living product, specs are versioned and diff-aware. When a feature evolves, the Spec Agent generates a spec delta showing exactly what changed, what is backwards-compatible, and what requires migration. The spec becomes the living contract between product intent and code reality.

Spec-driven development means: no code gets written until the spec passes validation. No test gets generated without referencing the spec. No demo gets built that does not trace back to spec assertions. Drift between code and spec triggers automated alerts.

Why This Phase Matters

This is the keystone of the entire model. Every downstream agent (coder, tester, reviewer, documenter) traces back to the spec. If the spec is precise, the entire pipeline accelerates. If it's vague, every downstream phase inherits the cost.

Inputs

Approved PRD
Existing spec versions
Current codebase schema
API catalogue
Existing domain model map

Outputs

Technical specification (versioned)
OpenAPI / GraphQL schemas
DB migration scripts
Acceptance criteria (machine-readable)
Executable examples and guardrails
Domain model map (living)
Spec delta (for iterations)

Human Gate

100% PRD → Spec traceability
All schemas validate
Breaking changes flagged and migration planned
Tech lead approval

Example Agents & Skills

Spec Writer spec-forge

Transforms PRDs into precise technical specs. Generates OpenAPI schemas, DB migrations, component interfaces, state machines, and machine-parseable acceptance criteria. Understands existing codebase conventions.

Claude API OpenAPI Generator JSON Schema GitHub MCP

Contract Validator contract-check

Validates spec completeness: every PRD requirement has a corresponding spec element, every API has error handling, every state transition is defined. Flags gaps and ambiguities before code begins.

Schema Validator Traceability Matrix Claude API

Spec Diff Engine spec-delta

For living products: generates semantic diffs between spec versions. Classifies changes as additive, breaking, or migration-required. Auto-generates migration guides and backwards-compatibility reports.

Git Diff Semantic Versioning Migration Generator

Domain Cartographer system-cartographer

Builds a living map of the existing product: modules, business entities, configuration variants, and legacy integration seams. Feeds every downstream phase with domain context so agents understand the shape of the system before they change it.

AST Analysis GitHub MCP Schema Introspection Mermaid Charts

Scenario Author scenario-scribe

Generates executable examples alongside schemas: canonical business scenarios, exception paths, migration cases, and explicit "never-do-this" guardrails. Produces constraints that coding agents consume as hard boundaries during implementation.

Claude API Test Framework Constraint Engine

▾

04PHASE

🏗Architecture & System Design

Infrastructure decisions that scale

arch-blueprint infra-scout adr-review platform-check

With a locked spec, the Architecture Agent designs the system topology. It selects patterns (microservices vs monolith, event-driven vs request-response), defines infrastructure requirements, and produces deployment architecture diagrams.

Critically, it reads the existing architecture and the enterprise platform standards it sits within: approved tech stacks, service mesh patterns, network zones, IAM baselines, and cost allocation tags. It designs for integration, not greenfield. For a living product adding a new feature module, it identifies where the new service fits, which existing services need interface changes, what infrastructure needs provisioning, and what platform exceptions (if any) are required.

The output is an Architecture Decision Record (ADR) with rationale, alternatives considered, and trade-offs, all generated by AI, reviewed by a human architect.

Why This Phase Matters

Architecture mistakes are the most expensive bugs. An AI agent that reads existing infrastructure, evaluates trade-offs against real benchmarks, and generates ADRs with rationale creates architecture decisions that are defensible, not just instinctive.

Inputs

Validated technical spec
Existing architecture docs
Performance budgets
Scale requirements

Outputs

Architecture Decision Records
System topology diagrams
Infrastructure-as-Code templates
Cost projections
ADR red-team report
Platform standards compliance check

Human Gate

Architect review and sign-off
Cost within budget envelope
No single-point-of-failure risks
Platform team sign-off on any exceptions

Example Agents & Skills

Architecture Planner arch-blueprint

Designs system architecture from spec. Produces service topology, data flow diagrams, infrastructure requirements, and ADRs. Considers existing system constraints and proposes integration strategy.

Claude API Mermaid Charts Terraform Templates AWS/GCP APIs

Infrastructure Scout infra-scout

Evaluates hosting, database, caching, and messaging options. Benchmarks cost, latency, and scalability. Produces infrastructure cost projections and scaling playbooks.

Cloud Pricing APIs Benchmark Suite Supabase MCP

ADR Reviewer adr-review

Red-teams the Architecture Decision Record. Challenges alternatives, stress-tests the proposed approach against failure modes and scale scenarios, and verifies that trade-offs are honestly stated. Acts as the sceptical second pair of eyes before human architect review.

Claude API Benchmark Suite

Platform Guardrail Agent platform-check

Validates the proposed architecture against enterprise platform standards: approved services, network zones, IAM patterns, security baselines, and cost allocation policies. Flags non-standard choices that require platform team exceptions and estimates the cost of exception versus conformance.

Platform Catalogue API Policy-as-Code IAM Analyser

▾

04aSUB-PHASE

🗂Delivery Planning & Work Decomposition

From approved spec to bounded, verifiable work units

delivery-planner impact-scan env-provision

This is the gap between knowing what to build and starting to build it. Modern agentic development follows a research → plan → branch → PR pattern, and agents perform best when work is bounded and verifiable rather than handed a large spec and told to "implement."

The Delivery Planner converts the approved spec and architecture into a concrete task graph: dependency ordering, PR slicing strategy, parallelisable work streams, and rollback points for each unit of work. Every task traces back to a spec element and forward to a verifiable outcome.

For brownfield products, this phase is critical. The Impact Analyser maps every planned change to impacted components, existing test suites that must continue to pass, configuration dependencies, and legacy integration seams that constrain delivery order. Without this step, agents generate code that compiles in isolation but breaks the system at integration.

The output is not a project plan for humans. It's a machine-readable task graph that coding agents consume as bounded work instructions. Each node in the graph specifies its inputs, expected outputs, validation criteria, and rollback trigger.

Why This Sub-Phase Matters

The biggest failure mode in AI-assisted development isn't bad code generation. It's unbounded work scope. When an agent receives a full spec and attempts to implement everything in a single pass, the result is brittle, untestable, and impossible to roll back. Decomposing work into bounded, dependency-ordered units is what makes agent-driven implementation reliable at production scale.

Inputs

Validated technical spec
Architecture decision records
Existing codebase and test suites
Configuration dependency map

Outputs

Task graph (dependency-ordered)
PR slicing plan
Impact matrix
Parallel-work conflict register
Rollback strategy per work unit
Environment provisioning checklist

Human Gate

All tasks traceable to spec elements
No circular dependencies in graph
Rollback point defined per PR
Sequencing checked against release calendar and active PRs
Tech lead approval on sequencing

Example Agents & Skills

Delivery Planner delivery-planner

Converts approved spec + architecture into a directed task graph. Identifies PR boundaries, dependency order, parallelisable streams, and rollback points. Slices large specs into bounded, verifiable units that coding agents execute independently.

Claude API Linear MCP GitHub MCP Mermaid Charts

Impact Analyser impact-scan

Maps every spec element to impacted components, existing test suites, configuration files, and environment dependencies. Flags legacy seams and integration points that constrain delivery order. Also flags in-flight parallel work on the same components, open PRs that create conflict risk, and upcoming release windows or code freezes that constrain sequencing. Produces an impact matrix the planner uses to sequence work safely.

AST Analysis GitHub MCP Test Runner Dependency Graph

Environment Planner env-provision

Determines what infrastructure, feature flags, data migrations, and environment setup must happen before implementation begins. Produces provisioning checklists and pre-flight validation scripts so agents don't start coding against an unprepared environment.

Terraform Templates Docker Supabase MCP GitHub Actions

▾

05PHASE

⌨Implementation

AI agents write production code from spec

code-engine ui-ux-pro-max api-forge data-smith

This is where spec-driven development pays off. Coding agents receive the spec as their instruction set: not vague requirements, but precise schemas, interfaces, and acceptance criteria. The agents generate production code that conforms to the spec, follows existing codebase conventions, and includes inline documentation.

Specialised agents handle different domains: the Frontend Specialist builds UI components with design system compliance, the API Builder implements endpoints matching OpenAPI schemas exactly, and the Data Layer Agent writes migrations and query layers. Each agent operates within its bounded context.

For living products, agents understand the existing code graph. They don't generate isolated code; they integrate into the existing module structure, respect import conventions, extend existing test patterns, and follow established naming conventions.

Why This Phase Matters

Code generation isn't the hard part. Generating code that respects existing patterns, passes existing tests, and integrates with existing services is. This phase treats implementation as spec-execution, not creative writing.

Inputs

Validated technical spec
Architecture decision records
Existing codebase context
Design system tokens

Outputs

Production code (PR-ready)
Database migrations
API implementations
UI components
Inline documentation

Human Gate

All code compiles and lints clean
Spec conformance check passes
No existing tests broken

Example Agents & Skills

Code Engine code-engine

Core implementation agent. Reads spec, understands existing codebase patterns, generates production code with proper error handling, logging, and documentation. Operates in bounded contexts per service/module.

Claude Code GitHub MCP AST Analysis ESLint/Prettier

Frontend Specialist ui-ux-pro-max

Builds UI components with pixel-perfect design system compliance. Generates accessible, responsive components with proper state management, animations, and design tokens. Handles React, Vue, or vanilla.

Claude Code Figma API Storybook Axe Accessibility

API Builder api-forge

Implements API endpoints matching OpenAPI spec exactly. Generates middleware, validation layers, rate limiting, and API documentation. Ensures backwards compatibility with existing endpoints.

OpenAPI Codegen Postman MCP Claude Code

Data Layer Agent data-smith

Writes database migrations, ORM models, query optimisation, and data access layers. Validates migration safety, checks for data loss risks, and generates rollback scripts.

Supabase MCP Prisma/Drizzle Migration Validator

▾

06PHASE

🔍Code Review & Quality

Multi-agent review against spec and standards

review-sentinel spec-drift risk-assess

Every PR triggers a multi-agent review pipeline. The Review Sentinel checks code quality, patterns, and conventions. The Spec Drift Detector compares implementation against the technical spec, and any deviation triggers a review flag with the specific spec clause being violated.

This isn't just linting. Agents evaluate semantic correctness: does this API handler actually implement the spec's error taxonomy? Does this component respect the state machine defined in the spec? Are the performance budgets being met?

Human engineers review the AI's review, resolve edge cases, and approve. The loop tightens over time as the agents learn codebase conventions and reduce false positives.

Why This Phase Matters

Human code review is a bottleneck that doesn't scale. AI review agents can catch 90% of issues instantly (pattern violations, security antipatterns, performance regressions), leaving humans to focus on design judgement calls.

Inputs

Pull request diff
Technical specification
Codebase style guide
Performance budgets

Outputs

Review comments (with severity)
Spec conformance report
Performance impact assessment
Blast radius report
Required reviewer list (CODEOWNERS-aware)
Suggested fixes (auto-PR)

Human Gate

Zero critical/high spec drift
Domain owner approval for affected code areas
Human engineer approval
All auto-fix suggestions resolved

Example Agents & Skills

Review Sentinel review-sentinel

Performs deep code review: architecture patterns, naming conventions, error handling completeness, performance anti-patterns, and accessibility violations. Generates line-by-line review comments with severity ratings.

Claude API GitHub MCP SonarQube ESLint

Spec Drift Detector spec-drift

Compares every code change against the technical spec. Detects deviations in API contracts, data models, state transitions, and error handling. Generates traceability reports linking code to spec clauses.

Spec Parser AST Diff Contract Testing

Regression Risk Assessor risk-assess

Scores the blast radius of every change. Identifies which downstream services, features, and user segments depend on the affected code. Enforces CODEOWNERS-based required reviewers, flags changes touching sensitive paths (auth, payments, data exports), and prioritises human review effort based on risk tier.

AST Analysis GitHub MCP Dependency Graph

▾

07PHASE

🧪Testing & Validation

Spec-derived test suites, generated and executed

test-forge chaos-monkey-ai perf-bench eval-replay

Tests are not written after implementation. They're generated from the spec before code exists. The acceptance criteria in the spec are machine-parseable assertions that the Test Forge converts into unit tests, integration tests, and end-to-end scenarios.

The Chaos Agent goes further: it generates adversarial test cases designed to break the system. Invalid inputs, race conditions, malformed payloads, extreme load patterns. It thinks like an attacker, not a user.

For living products, regression suites grow automatically. Every new feature adds tests; every bug fix adds a regression test. Existing test infrastructure (runners, harnesses, CI configs) is respected rather than replaced. Flaky tests are triaged rather than merely re-run. Test data reflects production shape, including the accumulated edge cases of legacy systems. The test suite becomes a living safety net that AI agents maintain alongside the code.

Critically, this phase now tests both the software artefact and the agent workflow that produced it. The Evaluation and Replay Harness turns past bugs, support tickets, and production incidents into reusable evaluation datasets. It replays agent workflows against known-good outcomes to detect regressions in agent behaviour, not just code behaviour.

Why This Phase Matters

Testing is the phase where spec-driven development pays one of its biggest dividends. Tests are generated directly from machine-readable acceptance criteria, ensuring increased test coverage as a natural byproduct.

Inputs

Technical spec (acceptance criteria)
Implementation code
Performance budgets
Existing regression suite
Historical bug/incident corpus

Outputs

Test suites (unit/integration/E2E)
Coverage baseline and trend
Adversarial test results
Performance benchmark report
Agent workflow eval results
Replay regression suite

Human Gate

≥90% spec coverage
No regression in coverage baseline
Zero critical failures
Performance within budget
All chaos tests documented

Example Agents & Skills

Test Forge test-forge

Generates comprehensive test suites from spec acceptance criteria. Unit tests, integration tests, E2E scenarios, and API contract tests. Maintains test-to-spec traceability matrix.

Claude Code Jest/Vitest Playwright Postman MCP

Chaos Agent chaos-monkey-ai

Generates adversarial test cases: fuzzing, boundary testing, race conditions, injection attacks, malformed payloads. Thinks like an attacker to find what happy-path testing misses.

Fuzzing Engine Load Generator Claude API

Performance Bench perf-bench

Runs performance benchmarks against spec budgets. Load testing, memory profiling, bundle size analysis, Core Web Vitals. Produces performance regression reports with flame graphs.

k6/Artillery Lighthouse Chrome DevTools Protocol

Evaluation & Replay Harness eval-replay

Turns past bugs, support tickets, and production incidents into reusable evaluation datasets and replay suites. Tests agent workflows against known-good outcomes to detect regressions in agent behaviour. Maintains a growing corpus of multi-turn evaluation scenarios.

Eval Framework Trace Logger Dataset Manager Claude API

▾

08PHASE

🛡Security & Compliance

Threat modelling and vulnerability scanning, automated

sec-guardian compliance-auto action-gatekeeper

Security is not a phase you bolt on at the end; it runs continuously from spec onwards. But this dedicated phase is the final deep scan before code enters staging.

The Security Guardian performs STRIDE threat modelling against the architecture, SAST/DAST scanning against the codebase, dependency vulnerability analysis, and secrets detection. For AI-powered features, it also checks for prompt injection vulnerabilities and model output safety.

The Compliance Agent validates against relevant frameworks (SOC2, GDPR, HIPAA, ISO 27001) based on the product's compliance profile. It generates audit-ready evidence and control documentation.

Beyond the phase-level gate, security governance now operates at the tool and action level. Shell commands, environment changes, and destructive actions are subject to explicit approval policies. Guardrails validate in real-time, and human review pauses the run before risky actions proceed. This is a fundamentally stronger control model than a single late-stage sign-off: every agent in every phase that touches sensitive resources operates within declared permission boundaries.

In brownfield contexts, security governance inherits more than it authors. Existing attestations, threat models, pen test findings, risk-accepted exceptions, enterprise IAM baselines, and approved key management setups are the starting point. The agents extend this posture; they do not replace it. Any new code that would regress an existing attestation or compliance control is flagged as blocking before the phase-level gate.

Why This Phase Matters

Security can't be a phase-gate bolt-on. When a security agent scans every commit, validates every dependency, and checks every API against OWASP in real-time, security shifts from audit to continuous assurance.

Inputs

Codebase and dependencies
Architecture diagrams
Compliance requirements
Threat model history
Agent action policies

Outputs

Threat model (STRIDE)
Vulnerability report (prioritised)
Compliance evidence pack
Remediation playbook
Action approval audit log
Posture regression report
Existing-exception register (respected)

Human Gate

Zero critical/high vulnerabilities
All compliance controls met
No regression against existing attestations
Action policies enforced across agents
Security lead sign-off

Example Agents & Skills

Security Guardian sec-guardian

Full security assessment: STRIDE threat model, SAST/DAST scanning, dependency audit (CVEs), secrets detection, prompt injection testing for AI features. Generates remediation playbooks with priority.

Semgrep Snyk API OWASP ZAP Claude API

Compliance Engine compliance-auto

Maps code and architecture against compliance frameworks. Generates control evidence, audit trails, and compliance gap reports. Auto-produces SOC2/GDPR documentation.

Policy Engine Audit Logger Data Flow Mapper

Action Gatekeeper action-gatekeeper

Enforces tool-level approval policies across all agent phases. Shell commands, environment mutations, destructive database operations, and sensitive reads require explicit human approval before execution. Logs every action decision for audit.

Policy Engine Approval Queue Audit Logger Agent Hooks

▾

09PHASE

🎬Demo & Staging

Auto-generated demos and stakeholder previews

demo-craft staging-spin

Before release, AI agents generate demo environments and stakeholder walkthroughs. The Demo Builder creates interactive previews with realistic sample data, guided tours highlighting new functionality, and before/after comparisons for iterative improvements.

The Staging Deployer provisions ephemeral environments per feature branch. Each PR gets its own staging URL with synthetic data, accessible to stakeholders for UAT without touching production.

For living products, demos include migration previews, showing existing users what will change, what's new, and what's been improved. This feeds directly into changelog and release communications.

Why This Phase Matters

The gap between 'it works on my machine' and 'stakeholders can see it' is where momentum dies. Automated staging with synthetic data and shareable preview links turns every merge into a demoable milestone.

Inputs

Merged code (staging branch)
Feature spec and PRD
Sample data profiles

Outputs

Ephemeral staging URL
Interactive demo walkthrough
Screenshot/video assets
UAT feedback collection

Human Gate

Stakeholder UAT sign-off
No blocking UX issues
Staging mirrors production topology (legacy dependencies present)
Demo traces to spec requirements

Example Agents & Skills

Demo Builder demo-craft

Generates interactive demos: sample data seeding, guided feature walkthroughs, screenshot/video capture, before/after comparisons. Produces stakeholder-ready demo packages.

Playwright Seed Data Gen Screen Recorder Vercel MCP

Staging Deployer staging-spin

Provisions ephemeral staging environments per feature branch. Configures synthetic data that reflects production data shape, stands up dependent services (including legacy ones) so the feature can be exercised end-to-end rather than in isolation, and emits observability to the same stack as production. Generates shareable URLs with access controls for UAT.

Vercel MCP Docker Supabase MCP GitHub Actions

▾

10PHASE

📖Documentation

Living docs generated from code, spec, and usage

doc-weaver doc-atlas api-doc-gen

Documentation is not written separately. It's derived from the spec, code, and tests. The Doc Weaver generates user-facing docs, developer guides, API references, and architecture overviews by reading the actual implementation.

For living products, docs are automatically updated when code changes. Every merged PR triggers a doc refresh: new endpoints appear in API docs, changed behaviour updates user guides, and deprecated features get sunset notices.

The system produces multiple doc types: end-user help (in-product and external), developer API docs, internal architecture docs, and onboarding guides for new team members.

Why This Phase Matters

Documentation written after the fact is always wrong. When a documentation agent generates docs from the spec and code simultaneously, docs become a living artefact, not a post-launch chore that nobody wants to do.

Inputs

Codebase (latest merged)
Technical spec
Test suite (as behaviour docs)
Existing documentation

Outputs

User documentation
API reference (interactive)
Architecture guides
Onboarding materials
Doc diff (what changed)
Existing documentation map
Canonical terminology register
Orphan doc index (link-broken or abandoned)

Human Gate

All public APIs documented
No stale doc references
No broken internal doc links
Canonical sources inherited, not overwritten
Readability score ≥ target

Example Agents & Skills

Doc Weaver doc-weaver

Generates comprehensive documentation from code, specs, and tests. User guides, developer docs, architecture overviews, onboarding materials. Auto-updates on code changes. Consumes the Doc Cartographer's existing-docs map so new documentation extends canonical structure and terminology where they matter. Writes for the right audience.

Claude API GitHub MCP Notion MCP MDX Generator

Doc Cartographer doc-atlas

Discovers and indexes documentation across internal sources: Confluence, internal wikis, code repos, Notion, Google Docs, shared drives, and SharePoint. Builds a reference graph that identifies what's canonical, what's stale, what's duplicated, and what's scattered. Feeds the Doc Weaver with existing canonical terminology and structure so new docs extend rather than replace. Preserves link integrity where it matters, surfaces orphaned references, and flags documentation that's been abandoned but still linked to.

Claude API Confluence MCP GitHub MCP Notion MCP Vector Index

API Documentation api-doc-gen

Generates interactive API documentation from OpenAPI specs and actual implementation. Includes code examples in multiple languages, error handling guides, and rate limit documentation.

OpenAPI Parser Postman MCP Code Example Gen

▾

11PHASE

🚀Release & Rollout

Automated release notes, changelogs, and deployment

release-captain changelog-pro migration-conductor

The Release Captain orchestrates the entire release process: generating semantic version numbers, compiling changelogs, creating release notes for different audiences (technical, end-user, executive), and executing the deployment pipeline.

Release notes are not vague. They're generated from the spec delta, commit history, and demo assets. Each change links back to the original problem brief, creating full traceability from customer need to shipped feature.

For living products, the Release Captain manages feature flags, progressive rollouts, and canary deployments. It monitors health metrics post-deploy and can trigger automatic rollback if anomalies are detected.

Why This Phase Matters

Release engineering is toil. When an AI agent handles changelog generation, version bumping, deployment orchestration, and rollback triggers, releases become a non-event, which is exactly what they should be.

Brownfield considerations

In-flight transactions	Connection draining and session continuity during deploy so active users are not disrupted.
Feature-flag segments	Rollout cohorts respect existing customer segments (tier, geography, beta); full flag lifecycle from create to cleanup.
Downstream service coordination	Breaking-change handshakes, deprecation timelines, and version compatibility with consumer services.
Change management and freezes	CAB approvals, change windows, and regional or product blackouts (Q4 retail, financial close, regulatory filings).
Customer communications	Proactive notices for breaking changes, scheduled maintenance windows, and status page updates.
Rollback complexity	Data-preserving versus data-destructive paths; partial rollback (region or cohort); rollback after live traffic.

Inputs

All approved PRs since last release
Spec deltas
Demo assets
Deployment config

Outputs

Semantic version tag
Multi-audience release notes
Deployed to production
Feature flag config
Rollout monitoring dashboard
Migration plan and sequencing
Backfill strategy
Zero-downtime migration path

Human Gate

All tests pass in CI
Canary health check green
Migration plan approved (data owner sign-off)
Release manager approval

Example Agents & Skills

Release Captain release-captain

Orchestrates release: semantic versioning, feature flag configuration, progressive rollout strategy, canary deployment, health monitoring, and auto-rollback triggers. Full deployment automation.

GitHub Actions Vercel MCP LaunchDarkly Datadog

Changelog Generator changelog-pro

Generates multi-audience release notes from spec deltas, commit history, and demo assets. Technical changelog, user-facing "What's New", executive summary, and customer comms draft. Full traceability.

Claude API Git Log Parser Notion MCP Slack MCP

Migration Coordinator migration-conductor

Orchestrates schema and data migrations coordinated with code deploys. Plans online versus offline migrations, sequences backfill jobs, manages data-destructive versus data-preserving rollbacks, and coordinates multi-step migrations where application code and database changes must deploy in a specific order. For brownfield, understands legacy schema constraints and generates zero-downtime migration paths.

Claude API Database MCPs Migration Framework Backfill Runner

▾

12PHASE

📡Monitoring, Feedback & Evolution

Production signals feed earlier phases where appropriate

prod-watcher feedback-loop

The lifecycle is not a strict sequence executed once. Post-release, the Production Watcher monitors error rates, performance metrics, usage patterns, and user feedback. It detects anomalies, classifies issues, and feeds them into the appropriate phase where improvement can land.

A performance regression feeds into Phase 7 (Testing). A spec deviation in production feeds into Phase 3 (Spec). A new customer need feeds into Phase 1 (Ideation). The system is self-healing and self-evolving.

The Feedback Loop Agent synthesises qualitative and quantitative signals into improvement proposals, generating new problem briefs that feed Phase 01 so the product never stops evolving.

Why This Phase Matters

The SDLC is not executed once in sequence. Phases feed into each other where appropriate: production signals inform ideation, anomalies trigger spec reviews, and the system continuously improves based on real-world behaviour.

Brownfield considerations

Observability stack continuity	New services emit to existing logging, metrics, and tracing stacks (SIEM, APM, distributed tracing) rather than introducing parallel ones.
Baseline comparison	Performance and error metrics are evaluated against existing production envelopes, not greenfield targets.
SLA and error-budget preservation	New code must not degrade established SLAs; error-budget consumption is tracked against existing service targets.
Alert threshold tuning	New alerts are calibrated against historical noise levels and reviewed for oncall load impact before activation.
Known-issue respect	Existing accepted anomalies do not trigger false-positive routing; historical incident patterns inform classification.
Historical feedback inheritance	Years of bug reports, feature requests, and CSAT trends feed new problem briefs as context, not noise.

Inputs

Production metrics and logs
User feedback and NPS
Usage analytics
Original success metrics

Outputs

Health dashboards
Anomaly alerts
Improvement proposals
New problem briefs → Phase 01
Feature impact reports

Human Gate

SLA targets met
Feature adoption ≥ threshold
Zero unresolved P0/P1 issues

Example Agents & Skills

Production Watcher prod-watcher

Monitors production health and the agent operating model itself: error rates, latency, resource usage, user behaviour anomalies, plus agent PR merge rate, median time to merge, review churn, rollback rate, spec-drift rate, and evaluation pass rate. Classifies issues and routes to appropriate lifecycle phase. Triggers auto-remediation for known patterns.

Datadog/Grafana PagerDuty Sentry Claude API

Feedback Loop Agent feedback-loop

Synthesises production metrics, user feedback, and usage analytics into improvement proposals. Generates new problem briefs that feed Phase 01 so learning from production informs the next wave of work. Measures feature impact against original success metrics.

Analytics Pipeline NPS/CSAT Tools Linear MCP Notion MCP

The Living Product Model

How spec-driven development works across continuous iterations, feature additions, and product evolution.

// 01

Spec Versioning

Every feature iteration generates a new spec version. The Spec Delta Engine produces semantic diffs (additive changes, breaking changes, and migration-required changes) so agents always know what's new and what's affected.

// 02

Feature Branches as Contexts

Each feature gets an isolated spec, code, and test context. Agents work in bounded feature branches. The merge process validates that the feature's spec is compatible with mainline before integration.

// 03

Regression-Aware Generation

Code agents don't generate in a vacuum. They read the full dependency graph, understand which modules their changes affect, and proactively run impacted tests. Breaking changes are flagged before they're committed.

// 04

Continuous Spec-Code Reconciliation

A background agent continuously compares the living spec against the actual codebase. Drift, where code diverges from spec over time, triggers automated reconciliation reports and remediation PRs.

// 05

Feature Impact Tracking

Post-release, every feature is measured against its original success metrics. The Feedback Loop Agent generates impact reports: did this feature actually solve the problem it was built for? Under-performing features trigger improvement cycles.

// 06

Deprecation & Sunset Automation

When features evolve or get replaced, agents manage the full deprecation lifecycle: spec updates, migration guides, user notifications, sunset timelines, and finally clean removal with zero dangling references.

What This Blueprint Is and Is Not

What It Is

A north star model for AI-native software delivery. A conceptual blueprint for product and engineering alignment. A way to think about specs, agents, humans, and feedback as one coordinated system. Provocative by design, intended to challenge assumptions about how software gets built.

What It Is Not

Not a locked implementation architecture; adapt it to your stack and maturity. Not a mandate for zero-human development; humans govern, agents execute. Not a return to heavy up-front specification; specs are living, sliceable, and ship in thin vertical slices. Not a claim that every phase must be fully automated from day one. Not a replacement for product judgement or engineering leadership.

Foundational Considerations

Cross-cutting layers that span the entire operating model. These are not phases; they are the connective fabric that makes every phase reliable, repeatable, and governable.

Cross-cutting layer

Knowledge & Context Fabric

The operating model frequently describes agents that "read the codebase" or "understand context," but a principal architect will ask: what curated system makes that reliable and repeatable? The answer is an explicit Knowledge and Context Fabric that spans every phase.

This fabric holds the structured context that every agent consumes: repository instructions, approved data sources, architecture decision records, domain glossaries, reference implementations, curated shared instructions, and reusable skills. Without it, each agent reinvents understanding from scratch. With it, context becomes an engineering discipline: versioned, maintained, and tested like code.

Repository instructions Domain glossaries Architecture maps ADR catalogue Reference implementations Shared agent instructions Reusable skills Approved data sources

Cross-cutting layer

Agent Governance & Model Ops

Governance cannot live beside the engineering system; it must live inside it. Every agent operating across the 12 phases does so within a declared governance envelope that covers approved models, prompt and instruction versioning, trace storage, evaluation baselines, approval policies, data-handling rules, cost and latency budgets, and audit trails.

This is not a policy document. It is a runtime layer: every agent invocation is logged, every sensitive action is gated, every model choice is versioned, and every evaluation baseline is tracked. The governance layer ensures that scaling agent-driven delivery does not mean scaling unaudited risk. Industry guidance from model providers, responsible AI frameworks, and delivery research all converge on this point: a clear organisational AI stance, enforced at the system level, is a prerequisite for production-grade agent operations.

Approved models Prompt versioning Trace storage Eval baselines Approval policies Data-handling rules Cost/latency budgets Audit trails

Cross-cutting layer

Source-of-Truth Hierarchy

A common failure mode of AI-native delivery is blurred boundaries between artefacts. When the PRD drifts into APIs and Architecture drifts into business rules, both humans and agents lose the thread on which document to trust. This layer assigns each artefact a single, owned concern, and names code as the ground truth against which every other layer's drift is measured.

The hierarchy is a navigation aid as much as a contract. When intent changes, it changes in the PRD and the Spec follows. When technical constraints change, they change in Architecture. When code deviates from spec, reconciliation agents catch the drift. Clean ownership is what keeps the pipeline composable.

Artefact	Owns	Lives in
PRD	Business intent and outcomes	Phase 2
Spec	Executable contract	Phase 3
Architecture	Technical constraints and ADRs	Phase 4
Delivery Plan	Task graph and rollback boundaries	Phase 4a
Project Context	Local conventions and repo knowledge	Knowledge & Context Fabric
Code	Actual runtime behaviour; drift measured against it	Implementation

The least you can do

The full blueprint is aspirational. You don't need all of it to cut AI code slop heavily. Two practical on-ramps worth trying this week:

BMAD Method

A methodology with pre-built agent roles↗

Analyst, architect, PM, developer, and QA agents, with phase-bounded workflows. A practical way to introduce role-bounded AI collaboration without designing a bespoke agent system.

Superpowers

Reusable skills for Claude Code↗

Ready-made skills for test-driven development, systematic debugging, code review, and plan-driven development. Near-zero-friction way to bring discipline to everyday AI-assisted coding.

Either one, adopted honestly, dramatically reduces AI code slop compared to free-form prompting. Both are legitimate starting points for teams not ready to build the full operating model yet.

Contribute

Submit a Suggestion

Spotted a gap, have a sharper framing, or want to collaborate? Share it below. Every submission is read.

Name *

Email (optional)

Your function *

Suggestion type *

Related SDLC phase *

Topic *

Description *

0 / 5000

Reference link (optional)

Continue the series

The New Operating Model Is The Moat→

If SDLC is how software gets built, the operating model is how the organisation is shaped around it. A companion essay on AI-native org design: roles, risk, the operating stack, and the five moves that matter.