The AI-Native SDLC

A fundamentally different way to build software. A spec-driven, agent-powered operating model where AI agents own structured execution across every phase - humans govern the framework, harness, direction, quality, security and risk.

A north star blueprint for product and R&D teams exploring how software delivery changes when specs, agents, and feedback systems become first-class constructs.

Spec-Centred

The spec is the system contract. Every agent, test, and deployment traces back to it.

Agent-Orchestrated

Specialised agents own bounded work in each phase, operating within shared context and constraints.

Human-Governed

Humans set intent, approve critical transitions, and intervene where confidence or alignment drops.

Continuously Evolving

Not a one-shot build. Signals from production, users, and the market continuously reshape future work.

Industry Overview

Six signals from the frontier of AI-native software delivery -drawn from industry research, platform evolution, and emerging engineering practice.

SIGNAL 01

From Inline Assistance to Delegated Workflows

Modern AI-native development is shifting from chat-in-the-IDE toward issue-to-plan-to-branch-to-PR flows in managed environments. The leading cloud agent models are explicitly built around researching the repository, creating an implementation plan, making changes on a branch, running tests and linters in an ephemeral environment, and then returning a PR with measurable lifecycle metrics.

workflow delegation
SIGNAL 02

Better-Bounded Workflows Beat More Agents

The winning pattern is not "more agents" -it is better-bounded workflows. Emerging guidance from model providers and software delivery research is converging: successful teams start with simpler, composable workflows and only add fuller agent autonomy where flexibility is genuinely needed. Context and discipline matter, and complacency with AI-generated code is a real risk. The operating model should sound less like an agent catalogue and more like bounded execution with clear governance.

bounded execution
SIGNAL 03

Context Is a First-Class Engineering Discipline

Context engineering -the curation of what enters the model's context over long-running agent loops -is becoming a recognised discipline. Practices such as shared agent instructions, curated reference applications, and protocol-based access to live dependency and system context are now formalised in leading platforms. Repository instructions, agent configuration files, custom skills, and repository memory are practical tools improving agent performance at scale.

context engineering
SIGNAL 04

Agent Systems Are Evaluated Like Products

Modern practice has moved beyond "did the generated code compile?" Leading model providers now emphasise traces, graders, datasets, multi-turn evaluations, guardrails, and human approvals -because multi-step agents can fail in ways ordinary code tests do not catch. The implication: an AI-native operating model needs an explicit layer for evaluating agent behaviour, not only product behaviour.

agent evaluation
SIGNAL 05

Value Scales Through Platforms, Data, and Governance

Industry research is very clear that AI is an amplifier of the existing system, not a substitute for one. Healthy internal data, a clear organisational AI stance, user-centricity, and high-quality internal platforms are the capabilities that make AI useful at scale. With 90% of organisations already using internal platforms and 76% having dedicated platform teams, "AI-native SDLC" is becoming as much a platform problem as a coding problem.

platform + governance
SIGNAL 06

Brownfield Understanding Is Now Forward Engineering (WIP)

AI's role now stretches from understanding legacy codebases to forward engineering. For organisations with mature, complex product estates, this matters enormously. Agents must build understanding of existing domain models, configuration patterns, and legacy seams before they start changing core product behaviour. The operating model must account for comprehension as a prerequisite to generation.

legacy + comprehension
The 12-Phase Operating Model
Each phase expands to reveal its agents, tools, and illustrative outputs. Click any phase to explore.
01PHASE

💡Ideation & Problem Discovery

From raw signal to validated problem space
discovery-agent market-radar

The lifecycle begins not with a feature request but with signal ingestion. AI agents continuously scan customer feedback, support tickets, competitor launches, market reports, and internal usage analytics to surface problems worth solving.

The Discovery Agent synthesises these signals into structured problem hypotheses - each with severity score, addressable market size, and alignment to product vision. It generates "problem briefs" that a human product lead reviews and greenlights.

In a living product, this phase never stops. It runs as a background daemon, constantly feeding the backlog with prioritised, evidence-backed problem statements. New feature ideas, customer pain points, and market shifts all flow through this funnel.

Why This Phase Matters

Without continuous signal ingestion, product teams react to the loudest voice instead of the most urgent need. This phase ensures every feature request is validated against real market data, customer pain, and competitive landscape. It replaces gut-feel backlog management with data-driven prioritisation.

Inputs
  • Customer feedback streams
  • Support ticket clusters
  • Usage analytics data
  • Competitor intelligence
  • Sales call transcripts
Outputs
  • Prioritised problem briefs
  • Opportunity scoring matrix
  • Competitive landscape map
  • Solution hypotheses (ranked)
Human Gate
  • Human product lead approval
  • Strategic alignment check
  • Problem validated with ≥3 data sources

Example Agents & Skills

Discovery Agent discovery-scout
Ingests signals from Intercom, Zendesk, G2, Reddit, HackerNews, sales call transcripts. Clusters themes, scores urgency, maps to existing roadmap items or creates new problem briefs.
Claude API Web Search Notion MCP Slack MCP
Market Radar market-radar
Tracks competitor product updates, patent filings, hiring signals, and funding rounds. Generates weekly competitive intelligence briefs with opportunity/threat scoring.
Web Fetch RSS Ingest Crunchbase API
Brainstorm Facilitator ideation-spark
Runs structured ideation sessions: "How Might We" prompts, SCAMPER frameworks, analogy mapping from adjacent industries. Produces ranked solution hypotheses with feasibility estimates.
Claude Sonnet Mermaid Charts FigJam API
02PHASE

📋Product Requirements (PRD)

Structured, buildable product definition
prd-architect ux-researcher

The approved problem brief feeds into the PRD Architect, which generates a comprehensive product requirements document - not a vague wish list, but an engineering-ready blueprint.

The PRD covers: user personas, jobs-to-be-done, functional requirements, non-functional requirements (performance, security, accessibility), success metrics, and rollout strategy.

For a living product, the PRD Agent understands the existing product context. It reads the current codebase, existing specs, and feature graph to ensure new requirements are additive and compatible. It flags conflicts with existing features and suggests migration paths.

A vague PRD creates a game of telephone between product and engineering. The PRD closes what and why ambiguity - who the user is, what they need, why it matters. The Spec closes how ambiguity - data models, API contracts, error taxonomy. Clean separation keeps each phase focused and prevents requirements from leaking into implementation.
Inputs
  • Approved problem brief
  • Existing product spec graph
  • Codebase architecture map
  • UX research findings
Outputs
  • Full PRD document
  • User journey maps
  • Dependency impact assessment
  • High-fidelity prototypes with dummy data
Human Gate
  • Product + Engineering sign-off
  • No unresolved dependency conflicts
  • Success metrics defined & measurable

Example Agents & Skills

PRD Architect prd-architect-pro
Generates full PRDs from problem briefs. Covers user stories, acceptance criteria, success metrics, scope boundaries, and phased rollout plan. References existing product specs for consistency.
Claude Opus GitHub MCP Notion MCP Linear MCP
UX Research Agent ux-insight
Generates user journey maps, identifies UX friction points, benchmarks competitor flows, and produces high-fidelity prototypes with dummy data and accessibility annotations (WCAG AA).
Figma API Hotjar Data Web Fetch
Dependency Mapper dep-graph
Analyses existing product architecture to identify feature dependencies, potential conflicts, and integration touchpoints. Produces dependency graphs and impact assessments.
GitHub MCP Mermaid Charts AST Parser
03PHASE

Technical Specification

The binding contract that agents code against
spec-writer contract-validator system-cartographer scenario-scribe

This is the keystone phase of AI-native development. The technical spec is not documentation - it's the executable contract that all downstream agents code against, test against, and validate against.

The spec includes: precise API schemas (OpenAPI), database migration scripts, component interface definitions, state machine diagrams, error taxonomy, performance budgets, and acceptance criteria written as machine-parseable assertions.

Critically, specs now include executable examples alongside schemas: canonical business scenarios, exception paths, migration cases, and explicit "never-do-this" boundaries that act as guardrails for downstream coding agents. Constraints like "never mutate this table directly" or "never call this API without a feature flag" are first-class spec content, not tribal knowledge. This makes the spec an operational contract, not a document artefact.

For brownfield products, spec writing depends on deep understanding of the existing system. The Domain Cartographer builds a living map of modules, business entities, configuration variants, and legacy touchpoints before any new spec is written. This is the bridge between "AI understands requirements" and "AI understands the shape of the existing product."

In a living product, specs are versioned and diff-aware. When a feature evolves, the Spec Agent generates a spec delta showing exactly what changed, what is backwards-compatible, and what requires migration. The spec becomes the living contract between product intent and code reality.

Spec-driven development means: no code gets written until the spec passes validation. No test gets generated without referencing the spec. No demo gets built that does not trace back to spec assertions. Drift between code and spec triggers automated alerts.

This is the keystone of the entire model. Every downstream agent - coder, tester, reviewer, documenter - traces back to the spec. If the spec is precise, the entire pipeline accelerates. If it's vague, every phase pays the tax.
Inputs
  • Approved PRD
  • Existing spec versions
  • Current codebase schema
  • API catalogue
  • Existing domain model map
Outputs
  • Technical specification (versioned)
  • OpenAPI / GraphQL schemas
  • DB migration scripts
  • Acceptance criteria (machine-readable)
  • Executable examples + guardrails
  • Domain model map (living)
  • Spec delta (for iterations)
Human Gate
  • 100% PRD → Spec traceability
  • All schemas validate
  • Breaking changes flagged + migration planned
  • Tech lead approval

Example Agents & Skills

Spec Writer spec-forge
Transforms PRDs into precise technical specs. Generates OpenAPI schemas, DB migrations, component interfaces, state machines, and machine-parseable acceptance criteria. Understands existing codebase conventions.
Claude Opus OpenAPI Generator JSON Schema GitHub MCP
Contract Validator contract-check
Validates spec completeness: every PRD requirement has a corresponding spec element, every API has error handling, every state transition is defined. Flags gaps and ambiguities before code begins.
Schema Validator Traceability Matrix Claude Sonnet
Spec Diff Engine spec-delta
For living products: generates semantic diffs between spec versions. Classifies changes as additive, breaking, or migration-required. Auto-generates migration guides and backwards-compatibility reports.
Git Diff Semantic Versioning Migration Generator
Domain Cartographer system-cartographer
Builds a living map of the existing product: modules, business entities, configuration variants, and legacy integration seams. Feeds every downstream phase with domain context so agents understand the shape of the system before they change it.
AST Analysis GitHub MCP Schema Introspection Mermaid Charts
Scenario Author scenario-scribe
Generates executable examples alongside schemas: canonical business scenarios, exception paths, migration cases, and explicit "never-do-this" guardrails. Produces constraints that coding agents consume as hard boundaries during implementation.
Claude Opus Test Framework Constraint Engine
04PHASE

🏗Architecture & System Design

Infrastructure decisions that scale
arch-planner infra-scout

With a locked spec, the Architecture Agent designs the system topology. It selects patterns (microservices vs monolith, event-driven vs request-response), defines infrastructure requirements, and produces deployment architecture diagrams.

Critically, it reads the existing architecture and designs for integration, not greenfield. For a living product adding a new feature module, it identifies where the new service fits, which existing services need interface changes, and what infrastructure needs provisioning.

The output is an Architecture Decision Record (ADR) with rationale, alternatives considered, and trade-offs - all generated by AI, reviewed by a human architect.

Architecture mistakes are the most expensive bugs. An AI agent that reads existing infrastructure, evaluates trade-offs against real benchmarks, and generates ADRs with rationale creates architecture decisions that are defensible, not just instinctive.
Inputs
  • Validated technical spec
  • Existing architecture docs
  • Performance budgets
  • Scale requirements
Outputs
  • Architecture Decision Records
  • System topology diagrams
  • Infrastructure-as-Code templates
  • Cost projections
Human Gate
  • Architect review & sign-off
  • Cost within budget envelope
  • No single-point-of-failure risks

Example Agents & Skills

Architecture Planner arch-blueprint
Designs system architecture from spec. Produces service topology, data flow diagrams, infrastructure requirements, and ADRs. Considers existing system constraints and proposes integration strategy.
Claude Opus Mermaid Charts Terraform Templates AWS/GCP APIs
Infrastructure Scout infra-scout
Evaluates hosting, database, caching, and messaging options. Benchmarks cost, latency, and scalability. Produces infrastructure cost projections and scaling playbooks.
Cloud Pricing APIs Benchmark Suite Supabase MCP
04aSUB-PHASE

🗂Delivery Planning & Work Decomposition

From approved spec to bounded, verifiable work units
delivery-planner impact-scan env-provision

This is the gap between knowing what to build and starting to build it. Modern agentic development follows a research → plan → branch → PR pattern, and agents perform best when work is bounded and verifiable rather than handed a large spec and told to "implement."

The Delivery Planner converts the approved spec and architecture into a concrete task graph: dependency ordering, PR slicing strategy, parallelisable work streams, and rollback points for each unit of work. Every task traces back to a spec element and forward to a verifiable outcome.

For brownfield products, this phase is critical. The Impact Analyser maps every planned change to impacted components, existing test suites that must continue to pass, configuration dependencies, and legacy integration seams that constrain delivery order. Without this step, agents generate code that compiles in isolation but breaks the system at integration.

The output is not a project plan for humans -it's a machine-readable task graph that coding agents consume as bounded work instructions. Each node in the graph specifies its inputs, expected outputs, validation criteria, and rollback trigger.

Why This Sub-Phase Matters

The biggest failure mode in AI-assisted development isn't bad code generation -it's unbounded work scope. When an agent receives a full spec and attempts to implement everything in a single pass, the result is brittle, untestable, and impossible to roll back. Decomposing work into bounded, dependency-ordered units is what makes agent-driven implementation reliable at production scale.

Inputs
  • Validated technical spec
  • Architecture decision records
  • Existing codebase + test suites
  • Configuration dependency map
Outputs
  • Task graph (dependency-ordered)
  • PR slicing plan
  • Impact matrix
  • Rollback strategy per work unit
  • Environment provisioning checklist
Human Gate
  • All tasks traceable to spec elements
  • No circular dependencies in graph
  • Rollback point defined per PR
  • Tech lead approval on sequencing

Example Agents & Skills

Delivery Planner delivery-planner
Converts approved spec + architecture into a directed task graph. Identifies PR boundaries, dependency order, parallelisable streams, and rollback points. Slices large specs into bounded, verifiable units that coding agents execute independently.
Claude Opus Linear MCP GitHub MCP Mermaid Charts
Impact Analyser impact-scan
Maps every spec element to impacted components, existing test suites, configuration files, and environment dependencies. Flags legacy seams and integration points that constrain delivery order. Produces an impact matrix the planner uses to sequence work safely.
AST Analysis GitHub MCP Test Runner Dependency Graph
Environment Planner env-provision
Determines what infrastructure, feature flags, data migrations, and environment setup must happen before implementation begins. Produces provisioning checklists and pre-flight validation scripts so agents don't start coding against an unprepared environment.
Terraform Templates Docker Supabase MCP GitHub Actions
05PHASE

Implementation

AI agents write production code from spec
code-engine ui-ux-pro-max api-builder

This is where spec-driven development pays off. Coding agents receive the spec as their instruction set - not vague requirements, but precise schemas, interfaces, and acceptance criteria. The agents generate production code that conforms to the spec, follows existing codebase conventions, and includes inline documentation.

Specialised agents handle different domains: the Frontend Agent builds UI components with design system compliance, the API Agent implements endpoints matching OpenAPI schemas exactly, and the Data Agent writes migrations and query layers. Each agent operates within its bounded context.

For living products, agents understand the existing code graph. They don't generate isolated code - they integrate into the existing module structure, respect import conventions, extend existing test patterns, and follow established naming conventions.

Code generation isn't the hard part - generating code that respects existing patterns, passes existing tests, and integrates with existing services is. This phase treats implementation as spec-execution, not creative writing.
Inputs
  • Validated technical spec
  • Architecture decision records
  • Existing codebase context
  • Design system tokens
Outputs
  • Production code (PR-ready)
  • Database migrations
  • API implementations
  • UI components
  • Inline documentation
Human Gate
  • All code compiles & lints clean
  • Spec conformance check passes
  • No existing tests broken

Example Agents & Skills

Code Engine code-engine
Core implementation agent. Reads spec, understands existing codebase patterns, generates production code with proper error handling, logging, and documentation. Operates in bounded contexts per service/module.
Claude Code GitHub MCP AST Analysis ESLint/Prettier
Frontend Specialist ui-ux-pro-max
Builds UI components with pixel-perfect design system compliance. Generates accessible, responsive components with proper state management, animations, and design tokens. Handles React, Vue, or vanilla.
Claude Code Figma API Storybook Axe Accessibility
API Builder api-forge
Implements API endpoints matching OpenAPI spec exactly. Generates middleware, validation layers, rate limiting, and API documentation. Ensures backwards compatibility with existing endpoints.
OpenAPI Codegen Postman MCP Claude Code
Data Layer Agent data-smith
Writes database migrations, ORM models, query optimisation, and data access layers. Validates migration safety, checks for data loss risks, and generates rollback scripts.
Supabase MCP Prisma/Drizzle Migration Validator
06PHASE

🔍Code Review & Quality

Multi-agent review against spec and standards
review-sentinel spec-drift-detector

Every PR triggers a multi-agent review pipeline. The Review Sentinel checks code quality, patterns, and conventions. The Spec Drift Detector compares implementation against the technical spec - any deviation triggers a review flag with the specific spec clause being violated.

This isn't just linting. Agents evaluate semantic correctness: does this API handler actually implement the spec's error taxonomy? Does this component respect the state machine defined in the spec? Are the performance budgets being met?

Human engineers review the AI's review, resolve edge cases, and approve. The loop tightens over time as the agents learn codebase conventions and reduce false positives.

Human code review is a bottleneck that doesn't scale. AI review agents can catch 90% of issues instantly - pattern violations, security antipatterns, performance regressions - leaving humans to focus on design judgment calls.
Inputs
  • Pull request diff
  • Technical specification
  • Codebase style guide
  • Performance budgets
Outputs
  • Review comments (with severity)
  • Spec conformance report
  • Performance impact assessment
  • Suggested fixes (auto-PR)
Human Gate
  • Zero critical/high spec drift
  • Human engineer approval
  • All auto-fix suggestions resolved

Example Agents & Skills

Review Sentinel review-sentinel
Performs deep code review: architecture patterns, naming conventions, error handling completeness, performance anti-patterns, and accessibility violations. Generates line-by-line review comments with severity ratings.
Claude Opus GitHub MCP SonarQube ESLint
Spec Drift Detector spec-drift
Compares every code change against the technical spec. Detects deviations in API contracts, data models, state transitions, and error handling. Generates traceability reports linking code to spec clauses.
Spec Parser AST Diff Contract Testing
07PHASE

🧪Testing & Validation

Spec-derived test suites, generated and executed
test-forge chaos-monkey perf-bench eval-replay

Tests are not written after implementation - they're generated from the spec before code exists. The acceptance criteria in the spec are machine-parseable assertions that the Test Forge converts into unit tests, integration tests, and end-to-end scenarios.

The Chaos Agent goes further: it generates adversarial test cases designed to break the system. Invalid inputs, race conditions, malformed payloads, extreme load patterns. It thinks like an attacker, not a user.

For living products, regression suites grow automatically. Every new feature adds tests; every bug fix adds a regression test. The test suite becomes a living safety net that AI agents maintain alongside the code.

Critically, this phase now tests both the software artefact and the agent workflow that produced it. The Evaluation and Replay Harness turns past bugs, support tickets, and production incidents into reusable evaluation datasets. It replays agent workflows against known-good outcomes to detect regressions in agent behaviour, not just code behaviour. This is one of the most important maturity signals in an AI-native operating model.

Testing is the phase where spec-driven development pays its biggest dividend. When tests are generated directly from machine-readable acceptance criteria, coverage isn't a metric to chase - it's a natural byproduct.
Inputs
  • Technical spec (acceptance criteria)
  • Implementation code
  • Performance budgets
  • Existing regression suite
  • Historical bug/incident corpus
Outputs
  • Test suites (unit/integration/E2E)
  • Test coverage report
  • Adversarial test results
  • Performance benchmark report
  • Agent workflow eval results
  • Replay regression suite
Human Gate
  • ≥90% spec coverage
  • Zero critical failures
  • Performance within budget
  • All chaos tests documented

Example Agents & Skills

Test Forge test-forge
Generates comprehensive test suites from spec acceptance criteria. Unit tests, integration tests, E2E scenarios, and API contract tests. Maintains test-to-spec traceability matrix.
Claude Code Jest/Vitest Playwright Postman MCP
Chaos Agent chaos-monkey-ai
Generates adversarial test cases: fuzzing, boundary testing, race conditions, injection attacks, malformed payloads. Thinks like an attacker to find what happy-path testing misses.
Fuzzing Engine Load Generator Claude Sonnet
Performance Bench perf-bench
Runs performance benchmarks against spec budgets. Load testing, memory profiling, bundle size analysis, Core Web Vitals. Produces performance regression reports with flame graphs.
k6/Artillery Lighthouse Chrome DevTools Protocol
Evaluation & Replay Harness eval-replay
Turns past bugs, support tickets, and production incidents into reusable evaluation datasets and replay suites. Tests agent workflows against known-good outcomes to detect regressions in agent behaviour. Maintains a growing corpus of multi-turn evaluation scenarios.
Eval Framework Trace Logger Dataset Manager Claude Sonnet
08PHASE

🛡Security & Compliance

Threat modelling and vulnerability scanning, automated
sec-guardian compliance-check action-gatekeeper

Security is not a phase you bolt on at the end - it runs continuously from spec onwards. But this dedicated phase is the final deep scan before code enters staging.

The Security Guardian performs STRIDE threat modelling against the architecture, SAST/DAST scanning against the codebase, dependency vulnerability analysis, and secrets detection. For AI-powered features, it also checks for prompt injection vulnerabilities and model output safety.

The Compliance Agent validates against relevant frameworks (SOC2, GDPR, HIPAA, ISO 27001) based on the product's compliance profile. It generates audit-ready evidence and control documentation.

Beyond the phase-level gate, security governance now operates at the tool and action level. Shell commands, environment changes, and destructive actions are subject to explicit approval policies. Guardrails validate in real-time, and human review pauses the run before risky actions proceed. This is a fundamentally stronger control model than a single late-stage sign-off: every agent in every phase that touches sensitive resources operates within declared permission boundaries.

Security can't be a phase-gate bolt-on. When a security agent scans every commit, validates every dependency, and checks every API against OWASP in real-time, security shifts from audit to continuous assurance.
Inputs
  • Codebase + dependencies
  • Architecture diagrams
  • Compliance requirements
  • Threat model history
  • Agent action policies
Outputs
  • Threat model (STRIDE)
  • Vulnerability report (prioritised)
  • Compliance evidence pack
  • Remediation playbook
  • Action approval audit log
Human Gate
  • Zero critical/high vulnerabilities
  • All compliance controls met
  • Action policies enforced across agents
  • Security lead sign-off

Example Agents & Skills

Security Guardian sec-guardian
Full security assessment: STRIDE threat model, SAST/DAST scanning, dependency audit (CVEs), secrets detection, prompt injection testing for AI features. Generates remediation playbooks with priority.
Semgrep Snyk API OWASP ZAP Claude Opus
Compliance Engine compliance-auto
Maps code and architecture against compliance frameworks. Generates control evidence, audit trails, and compliance gap reports. Auto-produces SOC2/GDPR documentation.
Policy Engine Audit Logger Data Flow Mapper
Action Gatekeeper action-gatekeeper
Enforces tool-level approval policies across all agent phases. Shell commands, environment mutations, destructive database operations, and sensitive reads require explicit human approval before execution. Logs every action decision for audit.
Policy Engine Approval Queue Audit Logger Agent Hooks
09PHASE

🎬Demo & Staging

Auto-generated demos and stakeholder previews
demo-builder staging-deploy

Before release, AI agents generate demo environments and stakeholder walkthroughs. The Demo Builder creates interactive previews with realistic sample data, guided tours highlighting new functionality, and before/after comparisons for iterative improvements.

The Staging Deployer provisions ephemeral environments per feature branch. Each PR gets its own staging URL with synthetic data, accessible to stakeholders for UAT without touching production.

For living products, demos include migration previews - showing existing users what will change, what's new, and what's been improved. This feeds directly into changelog and release communications.

The gap between 'it works on my machine' and 'stakeholders can see it' is where momentum dies. Automated staging with synthetic data and shareable preview links turns every merge into a demoable milestone.
Inputs
  • Merged code (staging branch)
  • Feature spec & PRD
  • Sample data profiles
Outputs
  • Ephemeral staging URL
  • Interactive demo walkthrough
  • Screenshot/video assets
  • UAT feedback collection
Human Gate
  • Stakeholder UAT sign-off
  • No blocking UX issues
  • Demo traces to spec requirements

Example Agents & Skills

Demo Builder demo-craft
Generates interactive demos: sample data seeding, guided feature walkthroughs, screenshot/video capture, before/after comparisons. Produces stakeholder-ready demo packages.
Playwright Seed Data Gen Screen Recorder Vercel MCP
Staging Deployer staging-spin
Provisions ephemeral staging environments per feature branch. Configures synthetic data, sets up monitoring, generates shareable URLs with access controls for UAT.
Vercel MCP Docker Supabase MCP GitHub Actions
10PHASE

📖Documentation

Living docs generated from code, spec, and usage
doc-weaver api-doc-gen

Documentation is not written separately - it's derived from the spec, code, and tests. The Doc Weaver generates user-facing docs, developer guides, API references, and architecture overviews by reading the actual implementation.

For living products, docs are automatically updated when code changes. Every merged PR triggers a doc refresh - new endpoints appear in API docs, changed behaviour updates user guides, and deprecated features get sunset notices.

The system produces multiple doc types: end-user help (in-product and external), developer API docs, internal architecture docs, and onboarding guides for new team members.

Documentation written after the fact is always wrong. When a documentation agent generates docs from the spec and code simultaneously, docs become a living artefact - not a post-launch chore that nobody wants to do.
Inputs
  • Codebase (latest merged)
  • Technical spec
  • Test suite (as behaviour docs)
  • Existing documentation
Outputs
  • User documentation
  • API reference (interactive)
  • Architecture guides
  • Onboarding materials
  • Doc diff (what changed)
Human Gate
  • All public APIs documented
  • No stale doc references
  • Readability score ≥ target

Example Agents & Skills

Doc Weaver doc-weaver
Generates comprehensive documentation from code, specs, and tests. User guides, developer docs, architecture overviews, onboarding materials. Auto-updates on code changes. Writes for the right audience.
Claude Sonnet GitHub MCP Notion MCP MDX Generator
API Documentation api-doc-gen
Generates interactive API documentation from OpenAPI specs and actual implementation. Includes code examples in multiple languages, error handling guides, and rate limit documentation.
OpenAPI Parser Postman MCP Code Example Gen
11PHASE

🚀Release & Rollout

Automated release notes, changelogs, and deployment
release-captain changelog-gen

The Release Captain orchestrates the entire release process: generating semantic version numbers, compiling changelogs, creating release notes for different audiences (technical, end-user, executive), and executing the deployment pipeline.

Release notes are not vague - they're generated from the spec delta + commit history + demo assets. Each change links back to the original problem brief, creating full traceability from customer need to shipped feature.

For living products, the Release Captain manages feature flags, progressive rollouts, and canary deployments. It monitors health metrics post-deploy and can trigger automatic rollback if anomalies are detected.

Release engineering is toil. When an AI agent handles changelog generation, version bumping, deployment orchestration, and rollback triggers, releases become a non-event - which is exactly what they should be.
Inputs
  • All approved PRs since last release
  • Spec deltas
  • Demo assets
  • Deployment config
Outputs
  • Semantic version tag
  • Multi-audience release notes
  • Deployed to production
  • Feature flag config
  • Rollout monitoring dashboard
Human Gate
  • All tests pass in CI
  • Canary health check green
  • Release manager approval

Example Agents & Skills

Release Captain release-captain
Orchestrates release: semantic versioning, feature flag configuration, progressive rollout strategy, canary deployment, health monitoring, and auto-rollback triggers. Full deployment automation.
GitHub Actions Vercel MCP LaunchDarkly Datadog
Changelog Generator changelog-pro
Generates multi-audience release notes from spec deltas, commit history, and demo assets. Technical changelog, user-facing "What's New", executive summary, and customer comms draft. Full traceability.
Claude Sonnet Git Log Parser Notion MCP Slack MCP
12PHASE

📡Monitoring, Feedback & Evolution

Closes the loop - production signals feed Phase 01
prod-watcher feedback-loop

The lifecycle is a loop, not a line. Post-release, the Production Watcher monitors error rates, performance metrics, usage patterns, and user feedback. It detects anomalies, classifies issues, and routes them back to the appropriate phase.

A performance regression routes to Phase 7 (Testing). A spec deviation in production routes to Phase 3 (Spec). A new customer need routes to Phase 1 (Ideation). The system is self-healing and self-evolving.

The Feedback Loop Agent synthesises qualitative and quantitative signals into improvement proposals - closing the circle by generating new problem briefs that feed Phase 01. The product never stops evolving.

The SDLC isn't a line - it's a loop. This phase closes the circle: production signals feed back into Phase 1, anomalies trigger spec reviews, and the system continuously improves based on real-world behaviour.
Inputs
  • Production metrics & logs
  • User feedback & NPS
  • Usage analytics
  • Original success metrics
Outputs
  • Health dashboards
  • Anomaly alerts
  • Improvement proposals
  • New problem briefs → Phase 01
  • Feature impact reports
Human Gate
  • SLA targets met
  • Feature adoption ≥ threshold
  • Zero unresolved P0/P1 issues

Example Agents & Skills

Production Watcher prod-watcher
Monitors production health and the agent operating model itself: error rates, latency, resource usage, user behaviour anomalies, plus agent PR merge rate, median time to merge, review churn, rollback rate, spec-drift rate, and evaluation pass rate. Classifies issues and routes to appropriate lifecycle phase. Triggers auto-remediation for known patterns.
Datadog/Grafana PagerDuty Sentry Claude Sonnet
Feedback Loop Agent feedback-loop
Synthesises production metrics, user feedback, and usage analytics into improvement proposals. Generates new problem briefs that feed Phase 01, closing the lifecycle loop. Measures feature impact against original success metrics.
Analytics Pipeline NPS/CSAT Tools Linear MCP Notion MCP

The Living Product Model

How spec-driven development works across continuous iterations, feature additions, and product evolution.

// 01

Spec Versioning

Every feature iteration generates a new spec version. The Spec Delta Engine produces semantic diffs - additive changes, breaking changes, and migration-required changes - so agents always know what's new and what's affected.

// 02

Feature Branches as Contexts

Each feature gets an isolated spec + code + test context. Agents work in bounded feature branches. The merge process validates that the feature's spec is compatible with mainline before integration.

// 03

Regression-Aware Generation

Code agents don't generate in a vacuum. They read the full dependency graph, understand which modules their changes affect, and proactively run impacted tests. Breaking changes are flagged before they're committed.

// 04

Continuous Spec-Code Reconciliation

A background agent continuously compares the living spec against the actual codebase. Drift - where code diverges from spec over time - triggers automated reconciliation reports and remediation PRs.

// 05

Feature Impact Tracking

Post-release, every feature is measured against its original success metrics. The Feedback Loop Agent generates impact reports - did this feature actually solve the problem it was built for? Under-performing features trigger improvement cycles.

// 06

Deprecation & Sunset Automation

When features evolve or get replaced, agents manage the full deprecation lifecycle: spec updates, migration guides, user notifications, sunset timelines, and finally clean removal with zero dangling references.

What This Blueprint Is and Is Not

What It Is

A north star model for AI-native software delivery. A conceptual blueprint for product and engineering alignment. A way to think about specs, agents, humans, and feedback as one coordinated system. Provocative by design - intended to challenge assumptions about how software gets built.

What It Is Not

Not a locked implementation architecture - adapt it to your stack and maturity. Not a mandate for zero-human development - humans govern, agents execute. Not a return to heavy up-front specification - specs are living, sliceable, and ship in thin vertical slices. Not a claim that every phase must be fully automated from day one. Not a replacement for product judgment or engineering leadership.

Foundational Considerations

Cross-cutting layers that span the entire operating model. These are not phases - they are the connective fabric that makes every phase reliable, repeatable, and governable.

Cross-cutting layer

Knowledge & Context Fabric

The operating model frequently describes agents that "read the codebase" or "understand context," but a principal architect will ask: what curated system makes that reliable and repeatable? The answer is an explicit Knowledge and Context Fabric that spans every phase.

This fabric holds the structured context that every agent consumes: repository instructions, approved data sources, architecture decision records, domain glossaries, reference implementations, curated shared instructions, and reusable skills. Without it, each agent reinvents understanding from scratch. With it, context becomes an engineering discipline - versioned, maintained, and tested like code.

Repository instructions Domain glossaries Architecture maps ADR catalogue Reference implementations Shared agent instructions Reusable skills Approved data sources
Cross-cutting layer

Agent Governance & Model Ops

Governance cannot live beside the engineering system - it must live inside it. Every agent operating across the 12 phases does so within a declared governance envelope that covers approved models, prompt and instruction versioning, trace storage, evaluation baselines, approval policies, data-handling rules, cost and latency budgets, and audit trails.

This is not a policy document. It is a runtime layer: every agent invocation is logged, every sensitive action is gated, every model choice is versioned, and every evaluation baseline is tracked. The governance layer ensures that scaling agent-driven delivery does not mean scaling unaudited risk. Industry guidance from model providers, responsible AI frameworks, and delivery research all converge on this point: a clear organisational AI stance, enforced at the system level, is a prerequisite for production-grade agent operations.

Approved models Prompt versioning Trace storage Eval baselines Approval policies Data-handling rules Cost/latency budgets Audit trails
Cross-cutting layer

Source-of-Truth Hierarchy

A common failure mode of AI-native delivery is blurred boundaries between artifacts. When the PRD drifts into APIs and Architecture drifts into business rules, both humans and agents lose the thread on which document to trust. This layer assigns each artifact a single, owned concern - and names code as the ground truth against which every other layer's drift is measured.

The hierarchy is a navigation aid as much as a contract. When intent changes, it changes in the PRD and the Spec follows. When technical constraints change, they change in Architecture. When code deviates from spec, reconciliation agents catch the drift. Clean ownership is what keeps the pipeline composable.

ArtifactOwnsLives in
PRDBusiness intent & outcomesPhase 2
SpecExecutable contractPhase 3
ArchitectureTechnical constraints & ADRsPhase 4
Delivery PlanTask graph & rollback boundariesPhase 4a
Project ContextLocal conventions & repo knowledgeKnowledge & Context Fabric
CodeActual runtime behaviour; drift measured against itImplementation
Contribute

Submit a Suggestion

Spotted a gap, have a sharper framing, or want to collaborate? Share it below. Every submission is read.

0 / 5000