The AI-Native SDLC
A fundamentally different way to build software. A spec-driven, agent-powered operating model where AI agents own structured execution across every phase - humans govern the framework, harness, direction, quality, security and risk.
A north star blueprint for product and R&D teams exploring how software delivery changes when specs, agents, and feedback systems become first-class constructs.
Spec-Centred
The spec is the system contract. Every agent, test, and deployment traces back to it.
Agent-Orchestrated
Specialised agents own bounded work in each phase, operating within shared context and constraints.
Human-Governed
Humans set intent, approve critical transitions, and intervene where confidence or alignment drops.
Continuously Evolving
Not a one-shot build. Signals from production, users, and the market continuously reshape future work.
Industry Overview
Six signals from the frontier of AI-native software delivery -drawn from industry research, platform evolution, and emerging engineering practice.
From Inline Assistance to Delegated Workflows
Modern AI-native development is shifting from chat-in-the-IDE toward issue-to-plan-to-branch-to-PR flows in managed environments. The leading cloud agent models are explicitly built around researching the repository, creating an implementation plan, making changes on a branch, running tests and linters in an ephemeral environment, and then returning a PR with measurable lifecycle metrics.
workflow delegationBetter-Bounded Workflows Beat More Agents
The winning pattern is not "more agents" -it is better-bounded workflows. Emerging guidance from model providers and software delivery research is converging: successful teams start with simpler, composable workflows and only add fuller agent autonomy where flexibility is genuinely needed. Context and discipline matter, and complacency with AI-generated code is a real risk. The operating model should sound less like an agent catalogue and more like bounded execution with clear governance.
bounded executionContext Is a First-Class Engineering Discipline
Context engineering -the curation of what enters the model's context over long-running agent loops -is becoming a recognised discipline. Practices such as shared agent instructions, curated reference applications, and protocol-based access to live dependency and system context are now formalised in leading platforms. Repository instructions, agent configuration files, custom skills, and repository memory are practical tools improving agent performance at scale.
context engineeringAgent Systems Are Evaluated Like Products
Modern practice has moved beyond "did the generated code compile?" Leading model providers now emphasise traces, graders, datasets, multi-turn evaluations, guardrails, and human approvals -because multi-step agents can fail in ways ordinary code tests do not catch. The implication: an AI-native operating model needs an explicit layer for evaluating agent behaviour, not only product behaviour.
agent evaluationValue Scales Through Platforms, Data, and Governance
Industry research is very clear that AI is an amplifier of the existing system, not a substitute for one. Healthy internal data, a clear organisational AI stance, user-centricity, and high-quality internal platforms are the capabilities that make AI useful at scale. With 90% of organisations already using internal platforms and 76% having dedicated platform teams, "AI-native SDLC" is becoming as much a platform problem as a coding problem.
platform + governanceBrownfield Understanding Is Now Forward Engineering (WIP)
AI's role now stretches from understanding legacy codebases to forward engineering. For organisations with mature, complex product estates, this matters enormously. Agents must build understanding of existing domain models, configuration patterns, and legacy seams before they start changing core product behaviour. The operating model must account for comprehension as a prerequisite to generation.
legacy + comprehensionIdeation & Problem Discovery
The lifecycle begins not with a feature request but with signal ingestion. AI agents continuously scan customer feedback, support tickets, competitor launches, market reports, and internal usage analytics to surface problems worth solving.
The Discovery Agent synthesises these signals into structured problem hypotheses - each with severity score, addressable market size, and alignment to product vision. It generates "problem briefs" that a human product lead reviews and greenlights.
In a living product, this phase never stops. It runs as a background daemon, constantly feeding the backlog with prioritised, evidence-backed problem statements. New feature ideas, customer pain points, and market shifts all flow through this funnel.
Why This Phase Matters
Without continuous signal ingestion, product teams react to the loudest voice instead of the most urgent need. This phase ensures every feature request is validated against real market data, customer pain, and competitive landscape. It replaces gut-feel backlog management with data-driven prioritisation.
Inputs
- Customer feedback streams
- Support ticket clusters
- Usage analytics data
- Competitor intelligence
- Sales call transcripts
Outputs
- Prioritised problem briefs
- Opportunity scoring matrix
- Competitive landscape map
- Solution hypotheses (ranked)
Human Gate
- Human product lead approval
- Strategic alignment check
- Problem validated with ≥3 data sources
Example Agents & Skills
discovery-scoutmarket-radarideation-sparkProduct Requirements (PRD)
The approved problem brief feeds into the PRD Architect, which generates a comprehensive product requirements document - not a vague wish list, but an engineering-ready blueprint.
The PRD covers: user personas, jobs-to-be-done, functional requirements, non-functional requirements (performance, security, accessibility), success metrics, and rollout strategy.
For a living product, the PRD Agent understands the existing product context. It reads the current codebase, existing specs, and feature graph to ensure new requirements are additive and compatible. It flags conflicts with existing features and suggests migration paths.
Inputs
- Approved problem brief
- Existing product spec graph
- Codebase architecture map
- UX research findings
Outputs
- Full PRD document
- User journey maps
- Dependency impact assessment
- High-fidelity prototypes with dummy data
Human Gate
- Product + Engineering sign-off
- No unresolved dependency conflicts
- Success metrics defined & measurable
Example Agents & Skills
prd-architect-proux-insightdep-graphTechnical Specification
This is the keystone phase of AI-native development. The technical spec is not documentation - it's the executable contract that all downstream agents code against, test against, and validate against.
The spec includes: precise API schemas (OpenAPI), database migration scripts, component interface definitions, state machine diagrams, error taxonomy, performance budgets, and acceptance criteria written as machine-parseable assertions.
Critically, specs now include executable examples alongside schemas: canonical business scenarios, exception paths, migration cases, and explicit "never-do-this" boundaries that act as guardrails for downstream coding agents. Constraints like "never mutate this table directly" or "never call this API without a feature flag" are first-class spec content, not tribal knowledge. This makes the spec an operational contract, not a document artefact.
For brownfield products, spec writing depends on deep understanding of the existing system. The Domain Cartographer builds a living map of modules, business entities, configuration variants, and legacy touchpoints before any new spec is written. This is the bridge between "AI understands requirements" and "AI understands the shape of the existing product."
In a living product, specs are versioned and diff-aware. When a feature evolves, the Spec Agent generates a spec delta showing exactly what changed, what is backwards-compatible, and what requires migration. The spec becomes the living contract between product intent and code reality.
Spec-driven development means: no code gets written until the spec passes validation. No test gets generated without referencing the spec. No demo gets built that does not trace back to spec assertions. Drift between code and spec triggers automated alerts.
Inputs
- Approved PRD
- Existing spec versions
- Current codebase schema
- API catalogue
- Existing domain model map
Outputs
- Technical specification (versioned)
- OpenAPI / GraphQL schemas
- DB migration scripts
- Acceptance criteria (machine-readable)
- Executable examples + guardrails
- Domain model map (living)
- Spec delta (for iterations)
Human Gate
- 100% PRD → Spec traceability
- All schemas validate
- Breaking changes flagged + migration planned
- Tech lead approval
Example Agents & Skills
spec-forgecontract-checkspec-deltasystem-cartographerscenario-scribeArchitecture & System Design
With a locked spec, the Architecture Agent designs the system topology. It selects patterns (microservices vs monolith, event-driven vs request-response), defines infrastructure requirements, and produces deployment architecture diagrams.
Critically, it reads the existing architecture and designs for integration, not greenfield. For a living product adding a new feature module, it identifies where the new service fits, which existing services need interface changes, and what infrastructure needs provisioning.
The output is an Architecture Decision Record (ADR) with rationale, alternatives considered, and trade-offs - all generated by AI, reviewed by a human architect.
Inputs
- Validated technical spec
- Existing architecture docs
- Performance budgets
- Scale requirements
Outputs
- Architecture Decision Records
- System topology diagrams
- Infrastructure-as-Code templates
- Cost projections
Human Gate
- Architect review & sign-off
- Cost within budget envelope
- No single-point-of-failure risks
Example Agents & Skills
arch-blueprintinfra-scoutDelivery Planning & Work Decomposition
This is the gap between knowing what to build and starting to build it. Modern agentic development follows a research → plan → branch → PR pattern, and agents perform best when work is bounded and verifiable rather than handed a large spec and told to "implement."
The Delivery Planner converts the approved spec and architecture into a concrete task graph: dependency ordering, PR slicing strategy, parallelisable work streams, and rollback points for each unit of work. Every task traces back to a spec element and forward to a verifiable outcome.
For brownfield products, this phase is critical. The Impact Analyser maps every planned change to impacted components, existing test suites that must continue to pass, configuration dependencies, and legacy integration seams that constrain delivery order. Without this step, agents generate code that compiles in isolation but breaks the system at integration.
The output is not a project plan for humans -it's a machine-readable task graph that coding agents consume as bounded work instructions. Each node in the graph specifies its inputs, expected outputs, validation criteria, and rollback trigger.
Why This Sub-Phase Matters
The biggest failure mode in AI-assisted development isn't bad code generation -it's unbounded work scope. When an agent receives a full spec and attempts to implement everything in a single pass, the result is brittle, untestable, and impossible to roll back. Decomposing work into bounded, dependency-ordered units is what makes agent-driven implementation reliable at production scale.
Inputs
- Validated technical spec
- Architecture decision records
- Existing codebase + test suites
- Configuration dependency map
Outputs
- Task graph (dependency-ordered)
- PR slicing plan
- Impact matrix
- Rollback strategy per work unit
- Environment provisioning checklist
Human Gate
- All tasks traceable to spec elements
- No circular dependencies in graph
- Rollback point defined per PR
- Tech lead approval on sequencing
Example Agents & Skills
delivery-plannerimpact-scanenv-provisionImplementation
This is where spec-driven development pays off. Coding agents receive the spec as their instruction set - not vague requirements, but precise schemas, interfaces, and acceptance criteria. The agents generate production code that conforms to the spec, follows existing codebase conventions, and includes inline documentation.
Specialised agents handle different domains: the Frontend Agent builds UI components with design system compliance, the API Agent implements endpoints matching OpenAPI schemas exactly, and the Data Agent writes migrations and query layers. Each agent operates within its bounded context.
For living products, agents understand the existing code graph. They don't generate isolated code - they integrate into the existing module structure, respect import conventions, extend existing test patterns, and follow established naming conventions.
Inputs
- Validated technical spec
- Architecture decision records
- Existing codebase context
- Design system tokens
Outputs
- Production code (PR-ready)
- Database migrations
- API implementations
- UI components
- Inline documentation
Human Gate
- All code compiles & lints clean
- Spec conformance check passes
- No existing tests broken
Example Agents & Skills
code-engineui-ux-pro-maxapi-forgedata-smithCode Review & Quality
Every PR triggers a multi-agent review pipeline. The Review Sentinel checks code quality, patterns, and conventions. The Spec Drift Detector compares implementation against the technical spec - any deviation triggers a review flag with the specific spec clause being violated.
This isn't just linting. Agents evaluate semantic correctness: does this API handler actually implement the spec's error taxonomy? Does this component respect the state machine defined in the spec? Are the performance budgets being met?
Human engineers review the AI's review, resolve edge cases, and approve. The loop tightens over time as the agents learn codebase conventions and reduce false positives.
Inputs
- Pull request diff
- Technical specification
- Codebase style guide
- Performance budgets
Outputs
- Review comments (with severity)
- Spec conformance report
- Performance impact assessment
- Suggested fixes (auto-PR)
Human Gate
- Zero critical/high spec drift
- Human engineer approval
- All auto-fix suggestions resolved
Example Agents & Skills
review-sentinelspec-driftTesting & Validation
Tests are not written after implementation - they're generated from the spec before code exists. The acceptance criteria in the spec are machine-parseable assertions that the Test Forge converts into unit tests, integration tests, and end-to-end scenarios.
The Chaos Agent goes further: it generates adversarial test cases designed to break the system. Invalid inputs, race conditions, malformed payloads, extreme load patterns. It thinks like an attacker, not a user.
For living products, regression suites grow automatically. Every new feature adds tests; every bug fix adds a regression test. The test suite becomes a living safety net that AI agents maintain alongside the code.
Critically, this phase now tests both the software artefact and the agent workflow that produced it. The Evaluation and Replay Harness turns past bugs, support tickets, and production incidents into reusable evaluation datasets. It replays agent workflows against known-good outcomes to detect regressions in agent behaviour, not just code behaviour. This is one of the most important maturity signals in an AI-native operating model.
Inputs
- Technical spec (acceptance criteria)
- Implementation code
- Performance budgets
- Existing regression suite
- Historical bug/incident corpus
Outputs
- Test suites (unit/integration/E2E)
- Test coverage report
- Adversarial test results
- Performance benchmark report
- Agent workflow eval results
- Replay regression suite
Human Gate
- ≥90% spec coverage
- Zero critical failures
- Performance within budget
- All chaos tests documented
Example Agents & Skills
test-forgechaos-monkey-aiperf-bencheval-replaySecurity & Compliance
Security is not a phase you bolt on at the end - it runs continuously from spec onwards. But this dedicated phase is the final deep scan before code enters staging.
The Security Guardian performs STRIDE threat modelling against the architecture, SAST/DAST scanning against the codebase, dependency vulnerability analysis, and secrets detection. For AI-powered features, it also checks for prompt injection vulnerabilities and model output safety.
The Compliance Agent validates against relevant frameworks (SOC2, GDPR, HIPAA, ISO 27001) based on the product's compliance profile. It generates audit-ready evidence and control documentation.
Beyond the phase-level gate, security governance now operates at the tool and action level. Shell commands, environment changes, and destructive actions are subject to explicit approval policies. Guardrails validate in real-time, and human review pauses the run before risky actions proceed. This is a fundamentally stronger control model than a single late-stage sign-off: every agent in every phase that touches sensitive resources operates within declared permission boundaries.
Inputs
- Codebase + dependencies
- Architecture diagrams
- Compliance requirements
- Threat model history
- Agent action policies
Outputs
- Threat model (STRIDE)
- Vulnerability report (prioritised)
- Compliance evidence pack
- Remediation playbook
- Action approval audit log
Human Gate
- Zero critical/high vulnerabilities
- All compliance controls met
- Action policies enforced across agents
- Security lead sign-off
Example Agents & Skills
sec-guardiancompliance-autoaction-gatekeeperDemo & Staging
Before release, AI agents generate demo environments and stakeholder walkthroughs. The Demo Builder creates interactive previews with realistic sample data, guided tours highlighting new functionality, and before/after comparisons for iterative improvements.
The Staging Deployer provisions ephemeral environments per feature branch. Each PR gets its own staging URL with synthetic data, accessible to stakeholders for UAT without touching production.
For living products, demos include migration previews - showing existing users what will change, what's new, and what's been improved. This feeds directly into changelog and release communications.
Inputs
- Merged code (staging branch)
- Feature spec & PRD
- Sample data profiles
Outputs
- Ephemeral staging URL
- Interactive demo walkthrough
- Screenshot/video assets
- UAT feedback collection
Human Gate
- Stakeholder UAT sign-off
- No blocking UX issues
- Demo traces to spec requirements
Example Agents & Skills
demo-craftstaging-spinDocumentation
Documentation is not written separately - it's derived from the spec, code, and tests. The Doc Weaver generates user-facing docs, developer guides, API references, and architecture overviews by reading the actual implementation.
For living products, docs are automatically updated when code changes. Every merged PR triggers a doc refresh - new endpoints appear in API docs, changed behaviour updates user guides, and deprecated features get sunset notices.
The system produces multiple doc types: end-user help (in-product and external), developer API docs, internal architecture docs, and onboarding guides for new team members.
Inputs
- Codebase (latest merged)
- Technical spec
- Test suite (as behaviour docs)
- Existing documentation
Outputs
- User documentation
- API reference (interactive)
- Architecture guides
- Onboarding materials
- Doc diff (what changed)
Human Gate
- All public APIs documented
- No stale doc references
- Readability score ≥ target
Example Agents & Skills
doc-weaverapi-doc-genRelease & Rollout
The Release Captain orchestrates the entire release process: generating semantic version numbers, compiling changelogs, creating release notes for different audiences (technical, end-user, executive), and executing the deployment pipeline.
Release notes are not vague - they're generated from the spec delta + commit history + demo assets. Each change links back to the original problem brief, creating full traceability from customer need to shipped feature.
For living products, the Release Captain manages feature flags, progressive rollouts, and canary deployments. It monitors health metrics post-deploy and can trigger automatic rollback if anomalies are detected.
Inputs
- All approved PRs since last release
- Spec deltas
- Demo assets
- Deployment config
Outputs
- Semantic version tag
- Multi-audience release notes
- Deployed to production
- Feature flag config
- Rollout monitoring dashboard
Human Gate
- All tests pass in CI
- Canary health check green
- Release manager approval
Example Agents & Skills
release-captainchangelog-proMonitoring, Feedback & Evolution
The lifecycle is a loop, not a line. Post-release, the Production Watcher monitors error rates, performance metrics, usage patterns, and user feedback. It detects anomalies, classifies issues, and routes them back to the appropriate phase.
A performance regression routes to Phase 7 (Testing). A spec deviation in production routes to Phase 3 (Spec). A new customer need routes to Phase 1 (Ideation). The system is self-healing and self-evolving.
The Feedback Loop Agent synthesises qualitative and quantitative signals into improvement proposals - closing the circle by generating new problem briefs that feed Phase 01. The product never stops evolving.
Inputs
- Production metrics & logs
- User feedback & NPS
- Usage analytics
- Original success metrics
Outputs
- Health dashboards
- Anomaly alerts
- Improvement proposals
- New problem briefs → Phase 01
- Feature impact reports
Human Gate
- SLA targets met
- Feature adoption ≥ threshold
- Zero unresolved P0/P1 issues
Example Agents & Skills
prod-watcherfeedback-loopThe Living Product Model
How spec-driven development works across continuous iterations, feature additions, and product evolution.
Spec Versioning
Every feature iteration generates a new spec version. The Spec Delta Engine produces semantic diffs - additive changes, breaking changes, and migration-required changes - so agents always know what's new and what's affected.
Feature Branches as Contexts
Each feature gets an isolated spec + code + test context. Agents work in bounded feature branches. The merge process validates that the feature's spec is compatible with mainline before integration.
Regression-Aware Generation
Code agents don't generate in a vacuum. They read the full dependency graph, understand which modules their changes affect, and proactively run impacted tests. Breaking changes are flagged before they're committed.
Continuous Spec-Code Reconciliation
A background agent continuously compares the living spec against the actual codebase. Drift - where code diverges from spec over time - triggers automated reconciliation reports and remediation PRs.
Feature Impact Tracking
Post-release, every feature is measured against its original success metrics. The Feedback Loop Agent generates impact reports - did this feature actually solve the problem it was built for? Under-performing features trigger improvement cycles.
Deprecation & Sunset Automation
When features evolve or get replaced, agents manage the full deprecation lifecycle: spec updates, migration guides, user notifications, sunset timelines, and finally clean removal with zero dangling references.
What This Blueprint Is and Is Not
What It Is
A north star model for AI-native software delivery. A conceptual blueprint for product and engineering alignment. A way to think about specs, agents, humans, and feedback as one coordinated system. Provocative by design - intended to challenge assumptions about how software gets built.
What It Is Not
Not a locked implementation architecture - adapt it to your stack and maturity. Not a mandate for zero-human development - humans govern, agents execute. Not a return to heavy up-front specification - specs are living, sliceable, and ship in thin vertical slices. Not a claim that every phase must be fully automated from day one. Not a replacement for product judgment or engineering leadership.
Foundational Considerations
Cross-cutting layers that span the entire operating model. These are not phases - they are the connective fabric that makes every phase reliable, repeatable, and governable.
Knowledge & Context Fabric
The operating model frequently describes agents that "read the codebase" or "understand context," but a principal architect will ask: what curated system makes that reliable and repeatable? The answer is an explicit Knowledge and Context Fabric that spans every phase.
This fabric holds the structured context that every agent consumes: repository instructions, approved data sources, architecture decision records, domain glossaries, reference implementations, curated shared instructions, and reusable skills. Without it, each agent reinvents understanding from scratch. With it, context becomes an engineering discipline - versioned, maintained, and tested like code.
Agent Governance & Model Ops
Governance cannot live beside the engineering system - it must live inside it. Every agent operating across the 12 phases does so within a declared governance envelope that covers approved models, prompt and instruction versioning, trace storage, evaluation baselines, approval policies, data-handling rules, cost and latency budgets, and audit trails.
This is not a policy document. It is a runtime layer: every agent invocation is logged, every sensitive action is gated, every model choice is versioned, and every evaluation baseline is tracked. The governance layer ensures that scaling agent-driven delivery does not mean scaling unaudited risk. Industry guidance from model providers, responsible AI frameworks, and delivery research all converge on this point: a clear organisational AI stance, enforced at the system level, is a prerequisite for production-grade agent operations.
Source-of-Truth Hierarchy
A common failure mode of AI-native delivery is blurred boundaries between artifacts. When the PRD drifts into APIs and Architecture drifts into business rules, both humans and agents lose the thread on which document to trust. This layer assigns each artifact a single, owned concern - and names code as the ground truth against which every other layer's drift is measured.
The hierarchy is a navigation aid as much as a contract. When intent changes, it changes in the PRD and the Spec follows. When technical constraints change, they change in Architecture. When code deviates from spec, reconciliation agents catch the drift. Clean ownership is what keeps the pipeline composable.
| Artifact | Owns | Lives in |
|---|---|---|
| PRD | Business intent & outcomes | Phase 2 |
| Spec | Executable contract | Phase 3 |
| Architecture | Technical constraints & ADRs | Phase 4 |
| Delivery Plan | Task graph & rollback boundaries | Phase 4a |
| Project Context | Local conventions & repo knowledge | Knowledge & Context Fabric |
| Code | Actual runtime behaviour; drift measured against it | Implementation |
Submit a Suggestion
Spotted a gap, have a sharper framing, or want to collaborate? Share it below. Every submission is read.
Thanks. Your suggestion was received.
I read every submission personally. If you left an email, I may follow up.