Writing Canvas

The design insight story

2026 · Amazon, Seattle, WA

What Every AI writing tool on the market makes the same mistake: they pull the writer out of their document. Chat forces you to leave, re-explain context, wait, and copy-paste back. Autocomplete guesses where you need help and is usually wrong.

Why The PMs, researchers, and designers I observed all wanted the same thing: AI that appears exactly where they're stuck, works while they keep writing, and stays out of the way otherwise.

How I designed and built an editor where AI lives inside the content itself — inline nodes placed directly in the document, async background generation via WebSocket streaming, and a companion AI panel. React + TipTap + Express + DynamoDB + BedrockAgentCore.

Overview

Knowledge workers — product managers drafting PRFAQs, researchers synthesizing findings, designers documenting decisions — spend significant time on long-form documents. Most AI writing tools interrupt rather than enhance, pushing users toward chat interfaces or intrusive autocomplete. This prototype was built to test three hypotheses about how AI assistance could improve a writer's productivity without disrupting their flow.

Opportunities

Observing how knowledge workers struggled with existing AI tools revealed four opportunities, each of which became a building block for a core feature.

AI that comes to you

Chat-based AI forces context-switching: leave, reformulate, wait, copy back. The friction often exceeded the benefit. Instead, place AI directly where content is needed and let it draw context from surrounding paragraphs.

AI that works behind the scenes

30–60 second waits are too short for complex tasks, and users don't want to feel stuck. Instead, trigger a task that keeps generating in the background while you keep writing. Schedule recurring queries for content that needs regular refresh.

AI that enhances your voice

Users wanted to improve their writing, not replace it. Highlight a passage, choose a refinement, and AI proposes changes in context. The author's intent is preserved; only the execution improves.

AI as a companion

Existing tools either isolate AI in a chat window or merge it into the editor as autocomplete. Both misread how writers work. A full editor alongside a conversational AI panel gives the author parallel spaces and control over when to engage.

The Test Bench

To explore these opportunities, users needed to start from something familiar. We built a block-based document editor at parity with the best note-taking apps — rich text, check-in/check-out locking, drag-to-reorder, and a companion comment panel — reaching feature parity in weeks rather than months with AI-assisted engineering. The baseline had to feel complete on its own so that any AI productivity improvement would be measurable against a credible control.

Q3 2024 Product Review

☰

Executive Summary

AK JP ✎

☰

The product team delivered three major initiatives this quarter: platform architecture migration, launch of the self-serve analytics dashboard, and expansion of the API partner program. Revenue exceeded targets by 12% while customer retention reached an all-time high of 94%.

AK ✎

☰

Metric	Q2	Q3	Delta
Revenue	$4.2M	$4.7M	+12%
Retention	89%	94%	+5pp
NPS	62	71	+9
P0 incidents	8	3	-63%

JP SM ✎

☰

Engineering velocity increased 30% after the platform migration completed in July. The team shipped 47 features versus 36 in Q2, while reducing P0 incident count from 8 to 3. Next quarter focus areas include API v2 rollout and enterprise SSO integration.

AK SM ✎

Feature 1: Inline AI Nodes

How might we bring AI directly to where the user is writing, so they never leave their document to get help?

Prototyping exposed the core tension: sidebar chat worked for general questions but created friction for location-specific requests. Inline nodes eliminated this entirely — insert at a location, and AI reads context from surrounding content, with output appearing where it belongs. Built as a custom TipTap atomic node, it can't be partially selected or inline-edited like text, keeping clear boundaries between AI and user content while rendering a full React component inside the editor.

Context-aware generationSection-level context extraction hits the sweet spot between relevance and token cost. AI outputs for "Customer Benefits" focus on benefits rather than generic content; full document context offers diminishing returns at higher cost.

Preview-then-acceptThree options were tested: auto-insert, preview-then-accept, and inline editing. Auto-insert removed user agency. Inline editing blurred AI-versus-human boundaries. Preview-then-accept gave control without adding workflow weight.

Inline AI

Q3 Performance Review

The third quarter showed significant improvements across key metrics. Revenue exceeded targets by 12% while customer retention reached all-time highs.

Ask AI

Generate section

Summarize

AI Request

Send

Feature 2: Offline AI Tasks

How might we let users keep working while AI generates, decoupling request from response and delivering results asynchronously?

We inverted the wait model: users submitted requests and returned to their document instead of watching a spinner. Testing showed a strong preference for async on anything beyond simple queries — it felt like an assistant working in the background. This also enabled parallel generation across a document outline, and scheduled queries that re-run on a cadence (daily or weekly), refreshing output automatically using the same inline node pattern extended with a scheduling layer.

research.md

Industry Research Notes

Tracking developments in AI tooling and developer productivity. Summarizing key announcements, product launches, and emerging patterns across the ecosystem.

Areas of focus: (1) New model releases and benchmark results. (2) Developer tool integrations and workflow changes. (3) Open-source projects gaining traction.

AI Request

Fetch the latest news from today and yesterday and generate a summary

✓ Fetched latest sources

✓ Analyzed recent developments

✓ Summary generated

Key updates: • New reasoning model benchmarks — 15% improvement on coding tasks over previous generation • IDE integration wave — three major editors shipped native AI assistants this week • Open-source momentum — local inference framework hit 50k stars, adding tool-use support

Schedule AI Query

Frequency Weekly Daily

Time 9:00 AM

Timezone PST (UTC−8)

Save

Feature 3: Text Refinement

Select text and refine it with AI — rewrite, expand, summarize, or adjust tone — without leaving the editor. Selection context (surrounding paragraphs, section heading) feeds the prompt, so refinements are contextually appropriate rather than generic.

proposal.md

Design Rationale

The writing canvas combines a block-based editor with AI capabilities. Each document is organized into sections containing individually lockable chunks.

AI integration follows a complex set of patterns involving multiple request types that handle different interaction modes including inline generation, text refinement, scheduled queries, and companion chat, each with their own streaming and state management requirements.

Context extraction operates at the section level, balancing relevance against token cost.

Rewrite Expand Summarize Adjust tone

Refined text

Replace Dismiss

Feature 4: Companion AI Panel

The companion panel runs alongside the editor as a conversational AI assistant. It follows a multi-turn agentic flow: thinking, tool use, human-in-the-loop approval for mutative actions, and streamed responses.

review.md

Metric	Q2	Q3	Delta
Revenue	$4.2M	$4.7M	+12%
Retention	89%	94%	+5pp
NPS	62	71	+9
Eng velocity	36	47	+30%

Suggestions

Summarize this Include performance matrix Pull in latest projects

Assistant

Thinking…

› Read “Q3 Metrics”

› Read “Q2 Baseline”

Thinking…

Insert into canvas

Reflections

Prototype the ideal, then design the fallbackCharacter-by-character streaming stuttered under jitter, so we settled for token-level. Fluid node expansion caused reflow, so we settled for fixed-height with scroll. The pitch for inline AI was never a slide deck; it was "let me show you both approaches." Starting with the ideal made every compromise deliberate.

Invert the obviousEvery core insight came from questioning entrenched patterns. Bring AI to users instead of taking users to AI. Let AI wait for users instead of making users wait. Keep AI as a companion instead of merging it into the editor.

Outcomes

Qualitative

Green light from the Metrics Hub team to integrate the Writing Canvas editor as the narrative authoring surface for metric presentations.

Green light from the Inquiry Hub team to integrate the collaborative editing and inline AI features into the investigation workflow.

Quantitative

Adoption and performance metrics expected Q3 2026, after the editor ships as an embedded surface within both downstream products.

Technology

The entire backend is deployed as infrastructure-as-code using AWS CDK. The stack includes API Gateway (REST and WebSocket), Lambda functions, DynamoDB tables, S3 buckets, EventBridge scheduler, and BedrockAgentCore for AI agent hosting, all defined in TypeScript and deployed through a single pipeline.

React TipTap / ProseMirror TypeScript Express Claude (Sonnet / Haiku / Opus)

Writing Canvas CDK infrastructure architecture diagram

Appendix

How did we make token-by-token streaming feel smooth?

A critical question: could token-by-token streaming from an AI model feel smooth across the full stack?

  AI Model (Claude)
    → BedrockAgentCore Runtime
    → API Gateway WebSocket
    → React State (useStreamingResponse)
    → TipTap Editor (100ms throttled commits)

The constraint was editor transaction frequency. TipTap's ProseMirror model requires atomic commits. Per-token commits caused jank; buffering too many created perceived lag. The solution: a throttled buffer committing every 100ms, found through measured iteration.

  V1: Fixed-height container with scrolling
      ✗ Users couldn't see full response during generation
  V2: Container expands as content grows
      ✗ Document reflow disrupted other editing
  V3: Fixed height, expand on completion
      ✗ Users couldn't predict final content length
      ↓
  V4 (shipped): Fixed height + internal scroll, smooth expand on complete
      ✓ Streaming visibility + document stability

Fixed Slot vs Sequential Display

Before

You Summarize this document

Assistant

Thinking

Tools

Response

After

You Summarize this document

Assistant

Thinking…

› Read “Overview”

Thinking…

› Read “Key Findings”

How did we replicate API Gateway WebSocket + AgentCore locally?

Production uses managed AWS services with no local equivalent. We stood up a local WebSocket server with an in-memory connection map mimicking API Gateway's connection lifecycle. A local Python agent replaces BedrockAgentCore, streaming responses to Express via HTTP callbacks. DynamoDB and S3 swap for local JSON files and disk paths. A single environment flag switches routing; Express handlers don't know which mode they're in.

// agentService.ts

// Same code, different backends
export async function invokeAgent(params: AgentInvokeParams): Promise<AgentResponse> {
  const useLocal = process.env.USE_AGENTCORE_LOCAL === 'true';
  if (useLocal) return invokeLocalAgent(params);   // HTTP POST to localhost:8080
  return invokeAgentCoreRuntime(params);            // AWS SDK → BedrockAgentCore
}

How did we decide on the WebSocket event naming convention?

Every event follows direction:scope:action, e.g., to-server:canvas:subscribe or from-server:assistant:stream-chunk. Direction disambiguates client from server. Scope is a feature namespace (system, canvas, assistant). This gives collision avoidance, prefix-based filtering, and zero-risk extensibility — a new scope like presence can't shadow existing events.

// wsEvents.ts

// src/websocket/wsEvents.ts

/** Actions the client sends TO the server */
export const ToServer = {
  System: {
    Ping: 'to-server:system:ping',
  },
  Canvas: {
    Subscribe: 'to-server:canvas:subscribe',
    Unsubscribe: 'to-server:canvas:unsubscribe',
  },
  Assistant: {
    Subscribe: 'to-server:assistant:subscribe',
    Unsubscribe: 'to-server:assistant:unsubscribe',
    SendChat: 'to-server:assistant:send-chat',
    Cancel: 'to-server:assistant:cancel',
    ToolResult: 'to-server:assistant:tool-result',
  },
} as const;

/** Events the server sends TO clients */
export const FromServer = {
  System: {
    Pong: 'from-server:system:pong',
  },
  Canvas: {
    Subscribed: 'from-server:canvas:subscribed',
    Unsubscribed: 'from-server:canvas:unsubscribed',
    ChunkLocked: 'from-server:canvas:chunk-locked',
    ChunkUnlocked: 'from-server:canvas:chunk-unlocked',
    ChunkCheckedOut: 'from-server:canvas:chunk-checked-out',
  },
  Assistant: {
    StreamChunk: 'from-server:assistant:stream-chunk',
    StreamComplete: 'from-server:assistant:stream-complete',
    HitlRequest: 'from-server:assistant:hitl-request',
  },
} as const;

How does human-in-the-loop approval work with AgentCore and Strands?

When the agent selects a write tool (write_section, update_chunk), the system raises an interrupt instead of executing. The approval prompt appears inline within the AI response stream — part of the conversation, not a modal. The agent session persists through the interrupt and resumes with the user's decision as context. Read tools execute freely; writes require explicit consent.

  User prompt
    → Agent reasons, selects write tool
      → HITL interrupt raised
        → Streamed to client via WebSocket
          → Inline approval prompt (not a modal)
            → User approves or denies
              → Decision sent back to same session
                → Agent resumes with context

// HITLPrompt.tsx

interface HITLRequest {
  type: 'hitl_request';
  requestId: string;
  action: string;      // "Write to section: Introduction"
  details: any;        // proposed content preview
  options: ['approve', 'deny'];
}

interface HITLResponse {
  requestId: string;
  decision: 'approve' | 'deny';
  feedback?: string;   // optional user guidance
}

// Rendered inline within the AI response stream
const HITLPrompt: React.FC<{ request: HITLRequest }> = ({ request }) => (
  <Alert type="info" header="Action Requires Approval">
    <p>{request.action}</p>
    <SpaceBetween direction="horizontal" size="xs">
      <Button onClick={() => sendHITLResponse({ requestId: request.requestId, decision: 'approve' })}>Approve</Button>
      <Button variant="link" onClick={() => sendHITLResponse({ requestId: request.requestId, decision: 'deny' })}>Deny</Button>
    </SpaceBetween>
  </Alert>
);

How are AI sessions, conversations, and stores managed?

Three storage layers. Session metadata (DynamoDB): identity, type, canvas association, timestamps. Conversation messages (S3): ordered message objects, append-only. Stream events (S3): raw streaming events for replay, debugging, and analytics. Sessions are typed because each has different lifecycle needs: assistant sessions load history on open, inline AI sessions scope to a node for multi-turn refinement, text refinement sessions are ephemeral and archive after completion.

// aiSessionTypes.ts

interface AISession {
  sessionId: string;
  canvasId: string;
  userId: string;
  type: 'assistant' | 'inline-ai' | 'text-refinement';
  createdAt: string;
  lastActivityAt: string;
  messageCount: number;
  metadata?: {
    nodeId?: string;   // for inline-ai: which node
    chunkId?: string;  // for text-refinement: which chunk
  };
}

How is the AI agent configured per request type?

The sessionType field tells the Python agent which system prompt, context, tools, and thinking budget to use. The agent has 50+ tools registered as Strands @tool decorated functions, assembled conditionally: core tools (canvas read/write, attachments, knowledge base, web search) are always available; app-specific tools load only if the user has access. A viewer-role user's agent has no write tools — the LLM cannot call tools it was never given.

// streamConfig.ts

export type SessionType =
  | 'chat'                // Sidebar assistant
  | 'ai-surface-request' // Inline AI node
  | 'text-refinement'    // Selected text rewrite
  | 'canvas-generation'  // Full canvas bootstrap
  | 'always-on-agent';   // Always-on reviewer

# agentFactory.py

# agentFactory.py — conditional tool assembly
tools = (
    CANVAS_READ_TOOLS          # read canvas, sections, chunks, search
    + CANVAS_WRITE_TOOLS       # create/update/delete (editors only, HITL)
    + FILE_TOOLS               # in-memory file staging + chunk commit
    + ATTACHMENT_TOOLS          # list, search, read uploaded docs
    + KNOWLEDGE_BASE_TOOLS     # search research studies (Bedrock KB)
    + WIDGET_TOOLS             # create/validate chart recipes
    + SESSION_TOOLS            # conversation history, prior responses
    + WEB_SEARCH_TOOLS         # internet search
    + DATE_TIME_TOOLS          # current date, date ranges
    + AUTHORIZATION_TOOLS      # permission checks (admin only)
    # App-specific tools — conditionally loaded per user access
    + INQUIRY_HUB_TOOLS        # case management operations
    + METRICS_HUB_TOOLS        # data source queries, visualization
)
# Always-on agents receive NO tools — text analysis only

  Chat
    → System prompt: general-purpose assistant for this canvas
    → Context: canvasId only (agent discovers content via tools)
    → Tools: all tools loaded per user permissions
    → HITL: required for write operations

  Inline AI node
    → System prompt: generate content for this specific location
    → Context: nodeId + chunkId + sectionId + section heading + preceding text
    → Tools: canvas read tools only
    → Output: raw content, inserted directly into the document

  Text refinement
    → System prompt: rewrite the selected text per user instruction
    → Context: originalText + originalHtml + startPos/endPos + refinement type
    → Tools: none
    → Output: refined text only, replaces selection in place

  Always-on agent
    → System prompt: review content against writing standards
    → Context: current chunk text + previous chunk text + diff stats
    → Tools: none
    → Output: advisory posted to assistant panel, never edits content

Always-on agents receive no tools at all — text analysis only via prompt and conversation history. Fast, cheap, and safe.

How do inline AI nodes manage their lifecycle?

Each inline AI node is a TipTap atomic block with a strict lifecycle. Node attributes (status, prompt, response, sessionId, timestamps) persist in the document JSON, so state survives page refreshes. Each React component subscribes to WebSocket events scoped to its nodeId, enabling parallel generation across nodes without cross-talk.

  Idle: empty block with insertion affordance
    → Composing: prompt input with context controls
      → Submitted: backend accepts the request
        → Generating: streaming response, internal scroll
          → Complete: full response with accept / edit / regenerate / dismiss
              ✗ Error: retry or dismiss

Nodes can also regenerate on a cron schedule — a research summary refreshing every morning, a competitive analysis updating weekly. The user configures a cron expression (e.g., 0 9 * * 1-5). There is no separate batch system; scheduled runs use the same invocation path as manual requests.

  User configures cron on AI node
    → Schedule stored in DynamoDB
      → EventBridge evaluates cron
        → Lambda invokes same agent path as manual request
          → Results stream via WebSocket to connected clients
            → If no clients: events persist to S3 for replay on reconnect

How is context secured across the stack?

Three layers. The frontend sends only identifiers (canvasId, prompt, nodeId). The backend validates IAM identity, checks canvas permissions, and sets userId from the authenticated session — never from the request body. The Python agent wraps every tool call in a CanvasContext that enforces require_read() or require_write(). A viewer's agent physically cannot execute writes.

# canvasContext.py

# canvasContext.py — role-based authorization
class CanvasContext:
    def __init__(self, canvas_id, user_id):
        self.canvas_id = canvas_id
        self.user_id = user_id
        self.role = self._determine_role()

    def require_read(self):
        if self.role == Role.NONE:
            raise PermissionError(
                f"No access to canvas {self.canvas_id}"
            )

    def require_write(self):
        if self.role not in (Role.OWNER, Role.CONTRIBUTOR):
            raise PermissionError(
                f"Cannot write to canvas {self.canvas_id}"
            )

# Every canvas tool checks authorization first
@tool
def get_section_content(section_id: str):
    cache = _requireRead()  # raises if no access
    return cache.get_section(section_id)

What is the Always-on Agent and why do we need it?

An auto-triggered AI session that watches for chunk saves and posts advisories in the assistant panel. Not every save deserves a full review. The pipeline has three stages: coalesce rapid edits at the document level, classify whether accumulated changes are significant, then trigger the review agent only if warranted.

  Chunk saved (any user, any chunk)
    → SQS queue (standard)
      → Lambda trigger: BatchSize 100, MaxBatchWindow 10s
        → Single invocation with all saves in the window
          → Group by documentId
            → Per-document classification (3-tier funnel)
              → Skip: no review needed
              → Review: invoke always-on agent
                → Advisory streamed to assistant panel

Coalesce: the SQS Lambda trigger's MaximumBatchingWindowInSeconds (10s) and BatchSize (100) accumulate save events into a single Lambda invocation. The Lambda groups by documentId, so saves from multiple users and chunks collapse into one batch per document.
Classify: a three-tier funnel decides whether the accumulated changes warrant a review:
- Tier 1: Rule-based (instant, zero cost): whitespace-only changes, identical content, changes below a character threshold. Pure string comparison; most saves die here.
- Tier 2: Pattern matching (fast, no AI cost): edit distance ratio flags small cosmetic changes. Regex detects high-signal patterns: numbers, dates, percentages, negations. High-signal patterns skip straight to review; low edit distance with no signal → skip.
- Tier 3: LLM gate (Haiku, ~200ms): only changes that survived tiers 1 and 2. Haiku receives the diff + section heading and returns skip or review.
Review: the always-on agent invokes with chunk context and posts advisories to the assistant panel. Per-chunk sessions mean the agent remembers prior flags: run 1 flags "significantly improved" as vague; run 2 acknowledges the fix. No tools, just plain text analysis via prompt and conversation history. Fast, cheap, safe.

Future: BYOA. Users create custom always-on agents by providing a name, system prompt, and trigger config, using the same SQS pipeline and session model as built-in agents.