Josh Estrada
Back to work
Enterprise SaaS · Sole Product Designer

AI Voice Assisted Task Creation

Field teams dictate work orders instead of typing them. The AI handles the rest.

Product UI is proprietary. Diagrams represent the design thinking, not the actual interface.

System Architecture

Designing the AI's behavior

The system prompt is where the real design work lives. It tells the AI how to parse messy speech into validated work orders.

Voice Input

"Fix the leaky faucet at unit 4B on Maple Street, assign it to John, it's pretty urgent"

unstructured ambiguous names implicit priority natural language
processes
System Prompt — the designer's artifact
Parsing Behavior
Extract intent, entities, and field values from natural speech
Validate against database — assignees, properties, categories
Return human preview + structured JSON simultaneously
Lookup Disambiguation
Exact match → populate field, confirm to user
? Fuzzy match → present numbered options for voice selection
No match → set default, inform user, remind they can edit
! Emergency → route to emergency services immediately
Conversation Rules
Keyboard fallback available at any point
Voice edits to preview before form submission
Two-stage review: conversational → populated form
outputs
Structured Output
Human Preview
title Fix leaky faucet
property 4B Maple Street ✓
assignee John Lee ✓
priority High (inferred)
due Today (inferred)
Form JSON
"title": "Fix leaky faucet", "property_id": 2847, "assignee_id": 156, "priority": "high", "due_date": "2026-03-14"
System prompt as the design deliverable

Instead of static mockups, the system prompt defined the AI's behavior — parsing, disambiguation, error handling, conversation flow. Engineering used it as their production starting point.

Two-stage review for trust

Users see a conversational preview first, then the populated form. Research confirmed people wanted AI to infer fields but wanted final sign-off before anything was created.

Keyboard fallback at any point

The voice-first flow never blocks users. They can switch to keyboard input at any time, so the AI-first approach doesn't become a cage.

Director Scrutiny Passed

Product directors grilled the edge cases and research methodology. They had questions on every scenario; the research held up.

Prompt → Production

Engineering used the system prompt from prototyping as their production starting point

Methodology Shift

The UX team started using the working-prototype workflow for their own AI feature testing

Component anatomy

What the system prompt actually controls

The system prompt is a design artifact. It specifies how the AI parses speech, resolves ambiguous names, handles lookup failures, and manages the conversation.

Voice Input
Transcription

"Fix the leaky faucet at unit 4B on Maple Street, assign it to John, it's pretty urgent"

Voice Keyboard
"What did the user say?"
VOICE INPUT
Speech-to-text transcription via API

Users tap to record and describe the task in their own words. The app transcribes the speech, then passes it to the AI as raw text. That text is messy by nature: names are ambiguous, priorities are implied, and phrasing is casual.

Design decision Keyboard fallback is always visible. Voice-first doesn't mean voice-only — users can switch input modalities at any point without losing context.
Lookup Result
assignee
"John"
John Lee Maintenance Tech
? Jon Li Property Manager
? John Lee Jr. Contractor
AI presents numbered options: "Did you mean 1) John Lee, 2) Jon Li, or 3) John Lee Jr.?"
"Who did they mean?"
AI BEHAVIOR
Lookup disambiguation decision tree

Assignees, properties, and categories all need to match real database records. When the AI finds an exact match, it populates the field automatically. Fuzzy matches get presented as numbered options so the user can select by voice. If nothing matches, the AI sets a sensible default and tells the user why.

Voice-specific constraint Phonetically similar names (John Lee vs Jon Li) are indistinguishable in speech. Numbered options let users say "one" or "two" to select — critical for a voice-first interface.
Review
1
Conversational Preview
AI summarizes: "I'll create a high-priority task to fix the leaky faucet at 4B Maple Street, assigned to John Lee, due today."
Edit by voice Looks good →
2
Populated Form
TitleFix leaky faucet
Property4B Maple St
AssigneeJohn Lee
PriorityHigh ⚡
Submit task
"Does this look right?"
AI BEHAVIOR
Two-stage review for user trust

User research confirmed people wanted the AI to infer fields like description, priority, and due date. But they also wanted final sign-off before anything was created. The two-stage review gives users a conversational confirmation first, then the populated form for final review and edits.

Showing the AI's work Inferred fields are visually distinguished from matched fields. Users can see exactly what the AI assumed vs. what it validated against the database.