From PRD to MR: AI Coding Workflow Design for Enterprise Android Projects
Introduction
This article introduces a workflow I built in Cursor: starting from PRD and design mockup inputs, through technical design generation, structured task breakdown, task-by-task coding, real-device verification, and finally automated MR submission. The entire pipeline is split into 7 stages, each with gating and human confirmation points. Through layered loading, on-demand Skill injection, and cross-session knowledge persistence, it addresses two core pain points of AI in long-chain decision-making: quality degradation over steps and “starting from scratch” with every requirement.
Note: The content is tailored to a specific enterprise Android project (MVP architecture, RxJava conventions, proprietary UI component library), but the underlying design philosophy — staged gating + knowledge persistence + token layering + on-demand loading — is generalizable to any tech stack.
Architecture Overview
The diagram below shows the complete information flow from input to delivery:
submodule update"] end S0 --> S1 subgraph S1 ["Stage 1: Requirements Analysis"] A1["Load cross-req knowledge"] --> A2["Fetch PRD/API docs
(DingTalk MCP)"] A2 --> A3["Extract features & list unknowns"] A3 --> A4["Output analysis.md"] end S1 -->|Human confirm| S2 subgraph S2 ["Stage 2: Design Analysis"] B1["Read Figma / screenshots"] --> B2["px→dp/sp conversion
layout structure inference"] B2 --> B3["Append to analysis doc"] end S2 -->|Human confirm| S3 subgraph S3 ["Stage 3: Technical Design"] C1["Generate tech-design.md/.html"] --> C2["Structured task table"] C2 --> C3["Server integration.md"] end S3 -->|Design confirmed| S4 subgraph S4 ["Stage 4: Task-by-Task Coding"] direction LR D1["Read learning notes"] --> D2["Search similar implementations"] D2 --> D3["Load corresponding Skill"] D3 --> D4["Code + module compile"] D4 -->|Human confirm| D5["Update task table"] D5 -->|Loop| D1 end S4 -->|All done + self-review| S5 subgraph S5 ["Stage 5: Full-Build Verification"] E1["Compile & install"] --> E2["Launch App"] E2 --> E3["Android MCP real-device verification"] E3 --> E4["Output verification-report.html"] end S5 -->|Verification passed| S6 subgraph S6 ["Stage 6: Submit MR"] F1["Create branch"] --> F2["Commit code & submodules"] F2 --> F3["Create MR"] F3 --> F4["Persist project learnings"] end S6 --> MR["MR Link + Verification Report"]
The entire process is driven by a Cursor Agent, relying on these underlying mechanisms:
| Layer | Mechanism | Purpose |
|---|---|---|
| Rules | .cursor/rules/*.mdc | Coding constraints, auto-injected by context |
| Skills | .cursor/skills/*/SKILL.md | Specialized capabilities (API, DB, UI generation, etc.), loaded on demand |
| Commands | .cursor/commands/*.md | User-triggered shortcut tasks |
| MCP Servers | .cursor/mcp.json | External tool bridges (docs, designs, real devices, Git platforms) |
Core Design Principles
1. Gating Mechanism: Preventing Decision Quality Degradation
AI decision quality in long chains degrades exponentially with the number of steps. Letting AI go straight from PRD to MR without intermediate calibration will inevitably result in severely degraded output quality. This parallels human development — coding without clarifying requirements guarantees rework.
Therefore, we enforce gating at the end of each stage: AI must wait for human confirmation via Ask Question before proceeding. Key gating points include:
- Stage 3 → 4: Technical design must be confirmed. If the design deviates from requirements, all subsequent coding is wasted.
- Stage 4 → 5: All tasks complete + final compilation passes + coding standards self-review passes.
- Stage 5 → 6: Real-device verification complete, all PRD feature points checked off.
Stage 4 has even finer granularity: confirmation after each task. The earlier implementation deviations are caught, the lower the fix cost.
2. Token Layer Control: Reducing Context Overhead
Cursor injects alwaysApply: true Rules with every conversation turn. The more content injected, the smaller the available context window, and the less room for code analysis. We split constraints into three layers:
| Layer | Trigger | Content Examples | Token Order |
|---|---|---|---|
| L0 Core | Every turn | 16 prohibitions + 11 requirements | ~500 |
| L1 Coding | Editing .java/.kt/.xml | Lifecycle, UI component specs, common errors | ~800 |
| L2 Domain | Editing API/DB files | Condensed core constraints + pointer to corresponding Skill | ~200 each |
The core layer is small (~40 lines) and nearly immutable, making it very friendly to Anthropic’s prompt caching — the more stable the prefix, the higher the cache hit rate and the lower the API cost. During non-coding stages (git operations, PRD analysis), only L0 is loaded, keeping token consumption under 500.
3. On-Demand Skill Loading: Avoiding Full Knowledge Injection
Cursor Skills use the description in YAML frontmatter to let AI decide whether to load the full content. At the start of each conversation turn, AI reads only the descriptions of all Skills (~50 tokens each), loading the full body only when matching the current task.
We defined 19 specialized Skills covering these capability domains:
- Core Development: API networking, database operations, data models, analytics
- UI-Related: Layout XML generation, design-to-layout conversion (with Figma smart learning), MVP page generation
- Quality Assurance: Module compilation verification, coding standards self-review, scripted code review
- Utilities: Android SO/AAR dependency analysis, A/B experiment decommissioning, Wiki generation/conversion/sync checks
- Process Control: Full requirements development flow (Stage 0→6)
All Skill descriptions total ~1,000 tokens; full content exceeds 25,000 tokens — on-demand loading saves over 95% of irrelevant context.
For example, the “API Networking Development” Skill’s description is “API networking development specification.” When AI needs to write network requests in Stage 4, it automatically matches and loads the Skill’s complete templates (API class selection guide, request templates, URL construction patterns, error handling). If the task is unrelated to APIs, this knowledge never enters the context.
4. Structured Task Table: Breaking Context Length Limits
Cursor sessions have context length limits. If a requirement includes 10+ coding tasks, by the 7th or 8th, AI may have forgotten which tasks were already completed. Relying solely on context memory is unreliable.
We embed a structured task table (Markdown table) in the Stage 3 technical design:
| # | Task Name | Type | Skill | Files Involved | Status |
|---|---|---|---|---|---|
| 1 | Add xxx field | Data | data-model | parsers.xml | ⬜ Pending |
| 2 | Add xxx API | Logic | api-development | CoreXxx.java | ⬜ Pending |
Status enum: ⬜ Pending → 🔄 In Progress → ✅ Complete → ⏭️ Skipped.
The table is written to a file on disk; AI can re-read the latest status at any time without depending on the context window’s “memory.” I evaluated JSON-based tracking but chose MD tables — humans can directly read and write them, AI parses MD tables effortlessly, and maintaining a separate JSON file only adds synchronization burden.
5. Self-Learning Loop: Cross-Requirement Knowledge Accumulation
During each requirement’s development, AI needs to search for similar implementations in the project to understand coding conventions. The first time a requirement type is encountered, 3-5 files need searching; if the same search is repeated for the second similar requirement, that’s efficiency loss.
The solution is maintaining a supplementary learning notes file convention-learnings.md. Before coding in Stage 4, AI reads this file and skips already-recorded patterns; during post-coding self-review, newly discovered patterns are appended. The next requirement’s Stage 4 directly reads the notes, skipping redundant searches.
Key constraint: AI never modifies Skill files themselves, only maintains supplementary notes. The reasoning: if AI writes incorrect patterns into a Skill (core instruction), all subsequent requirements are contaminated; if the notes file has errors, deletion suffices — controllable impact. This design borrows from the Hermes Agent’s Closed Learning Loop, with added safety boundaries.
6. Cross-Requirement Knowledge Persistence
Another persistent file, project-memory.md, addresses cross-requirement “background knowledge” transfer: current database version, which APIs are deprecated, special technical decisions, lessons learned. After requirement A completes (Stage 6), AI appends this information; when requirement B starts (Stage 1), it’s auto-loaded — no need to grep or review commit history.
A deprecation mechanism is in place: when the file exceeds 80 lines, outdated entries are cleaned to prevent unlimited growth and loading overhead.
Stage Details
Stage 0: Code Sync
git fetch origin && git checkout origin/main && \git submodule foreach 'git fetch origin && git checkout origin/master'Ensures starting from the latest code baseline. Proceeds directly to Stage 1 without human confirmation.
Stage 1: Requirements Analysis
- Load
project-memory.mdfor cross-requirement context. - Read PRD + API docs. Three input modes supported: DingTalk document links (fetched as Markdown via DingTalk MCP), local files, or user-pasted text.
- Extract client-side features, list unknowns and edge case gaps; cross-check API docs against PRD for contradictions.
- Output
docs/[requirement-name]/analysis.mdand request confirmation.
Stage 2: Design Analysis
When a Figma link is available, Framelink MCP reads layers/styles/layouts; otherwise, PRD screenshots are used. The design conversion Skill is invoked for px→dp/sp conversion, color format mapping, and layout structure inference. Output is appended to the analysis document.
Stage 3: Technical Design
Three files are produced:
tech-design.md(with task table) and its.htmlvisual versionserver-integration.md
The task table in the technical design is the sole basis for subsequent coding. Each task is annotated with type (data/logic/UI), the Skill to invoke, and the list of files involved. The table also serves as a progress tracking board.
Stage 4: Task-by-Task Coding
Each task follows a fixed pipeline:
(new scenarios only)"] B --> C["Load corresponding Skill"] C --> D["Code"] D --> E["Module compile"] E --> F{"Ask Question confirm"} F -->|Pass| G["Update task table status"] F -->|Reject| D G --> H["Next task"]
After all tasks are complete, a coding standards self-review (5-item checklist) is executed:
- Prohibited patterns check (e.g.,
context.getColor(), directSharedPreferencescalls) - Constant consistency
- Repository-specific constraints
- Import completeness (no fully-qualified class names in code body)
- Android security self-review (
android:exporteddeclarations, hardcoded tokens, new permissions, debug code isolation)
After self-review passes, newly discovered coding patterns are appended to convention-learnings.md.
Stage 5: Full-Build Verification
This stage performs real-device UI verification, not unit testing. The flow:
screen-by-screen verification"} D --> E["All features checked?"] E -->|No| F["Locate issue"] F --> C E -->|Yes| G["Output verification-report.html"]
Android MCP can execute dump_ui_hierarchy, tap operations, and screenshots. AI obtains the UI tree and checks each feature against the PRD. The verification report is a single-file HTML (inline styles, not committed to git) containing feature cards, status filtering, and environment info.
Stage 6: Submit MR
- Confirm change scope (main project + Git submodules).
- Create branch, update
.gitmodules. - Commit and push.
- Create submodule MR → main project MR.
- Persist project learnings: Append this requirement’s API changes, database version changes, lessons learned, etc., to
project-memory.md.
MCP Integration & Toolchain
| MCP Server | Function | Applied Stage |
|---|---|---|
| DingTalk Docs MCP | Read online DingTalk documents (PRD/API docs), return Markdown | Stage 1 |
| Framelink MCP for Figma | Read Figma design layers/styles/layouts | Stages 2, 5 |
| Android MCP | Real-device UI dump, tap, screenshot | Stage 5 |
| Code Review MCP | Remote MR diff reading + comment submission | Optional (Stage 5.5) |
| Context7 | Query third-party library docs (Fresco, RxJava, etc.) | On demand |
| Sequential Thinking | Step-by-step reasoning for complex problems | On demand |
Design Reflections
1. Why Not Parallel Agents?
Requirements development is inherently serial: technical design determines coding direction, coding determines verification content, verification determines MR scope. Parallel agents have no independent work units to allocate in this scenario and would instead cause code conflicts and context fragmentation. Cursor’s /multitask is suited for truly independent parallel tasks (like running lint and tests simultaneously), not for pipelines with dependencies.
2. Why Gate Every Stage?
Two reasons: first, quality calibration — AI decision quality in long chains degrades with step count, and each confirmation is a “realignment”; second, responsibility boundaries — after user confirmation, if issues arise, the problematic stage’s decision can be precisely located.
3. Why Do Skills Only Write to Supplementary Files?
Letting AI directly modify Skill files carries two risks: first, prompt injection — if AI writes incorrect patterns into a Skill, all subsequent requirements are affected; second, knowledge degradation — accumulated “learnings” may overwrite carefully human-crafted specifications. The supplementary file convention-learnings.md allows humans to review and delete incorrect entries at any time, and clearing it “resets learning” — far lower risk than modifying Skills themselves.
4. Design Constraints from Token Economics
Empirical data from the Claude Code community shows: reducing alwaysApply content from 800 tokens to 500 tokens can jump prompt cache hit rate from 12% to 61%. This is because Anthropic’s prompt caching is sensitive to prefix matching — the more stable the system prompt, the higher the cache hit rate and lower the cost. This explains our decision to keep the core constraint layer extremely thin and stable.
5. Ultimate Reflection: Process Automation vs. General Intelligence
The current workflow isn’t “AI replacing humans” but a hybrid model where humans define processes and AI executes tasks. Gating decisions are human-made; AI handles repetitive, rule-clear execution tasks (reading docs, writing code, running verification). This division of labor is the most pragmatic for now — AI’s generation and reasoning capabilities still have boundaries, but process-oriented orchestration can maximize its deterministic benefits.
If your team is also experimenting with AI-assisted development, I suggest starting from the weakest link: make AI stop and wait for your confirmation at key decision points, rather than letting it run to completion. This single change often delivers the most direct quality improvement. From there, gradually introduce knowledge persistence, layered loading, and other mechanisms to build a full-process automation system suited to your project.