Architecture
Full-stack diagram
The key insight: your React components never know which LLM produced the stream. The hook speaks a three-event protocol (text, done, error) over SSE. Any server that produces those events works — regardless of language, cloud, or LLM provider.
Provider abstraction
CustomProvider (your /api/chat endpoint) is the recommended production path — your API key stays on the server, provider routing is your business logic, and the React layer is untouched when you switch models.
OpenAIProvider and AnthropicProvider call the vendor API directly from the browser. Safe for local prototypes, not for production.
Streaming pipeline
See Streaming lifecycle for full details on callbacks and abort behavior.
Packages
@react-ai-stream/core
No React dependency. Runs in Node.js, Deno, Bun, or any JS environment.
- SSE parser — converts
ReadableStream<Uint8Array>toAsyncIterable<string>, handles buffering on\n\nboundaries - Chunk normalizer — maps provider-specific event shapes to
StreamChunk({type, text}) - Providers —
OpenAIProvider,AnthropicProvider,CustomProvider - Message store — Zustand store factory;
createMessageStore()returns a fully isolated store per call - Abort utilities — thin wrapper around
AbortControllerwithisAbortErrorguard
@react-ai-stream/react
Depends on @react-ai-stream/core and React.
useAIChat— subscribes to the message store viauseSyncExternalStore; orchestrates the streaming lifecycle; resets the client whenendpoint,provider,apiKey, or the context client changesAIChatProvider— React context for sharing a pre-builtAIClientacross a subtreeuseStableCallback— stable function reference that always calls the latest closure, used internally to prevent stale closures in async stream loops
@react-ai-stream/ui
Depends on @react-ai-stream/react. Completely optional.
Chat— all-in-one:MessageList+ChatInputMessageList— renders messages with typing indicator; auto-scrolls onmessagesorloadingchangeMessageBubble— individual message with role-based stylingChatInput— auto-resizing textarea with send/stopMarkdownRenderer— GFM viareact-markdown+rehype-highlight; copy button readstextContentto avoid[object Object]from syntax-highlighted children
State management
Each useAIChat instance owns a Zustand store created by createMessageStore():
{
messages: Message[]
loading: boolean
error: string | null
abortController: AbortController | null
}useSyncExternalStore subscribes to this store. Only the component that called useAIChat re-renders when its store updates — no context propagation, no unnecessary renders in siblings. Three hook instances → three completely isolated stores.
Why Zustand instead of useReducer
Zustand's createStore (vanilla, no React) lets the store live outside the React tree. This means:
- Store can be created without a component render
- Multiple components can subscribe without a shared context
- Store lifecycle is tied to the hook ref, not to a provider tree
This is what enables truly isolated chat instances without any wrapping <Provider>.
Abort semantics
Every sendMessage call creates a new AbortController. The signal is passed to fetch. When stop() is called or the component unmounts, abort() fires:
user calls stop()
→ abortController.abort()
→ fetch rejects with AbortError
→ stream loop catches isAbortError() → true
→ loading → false (no error surfaced)
→ partial response preserved in messagesOn the server side, the request's signal becomes aborted too (req.signal in Next.js edge/Node.js handlers). Passing it to the upstream fetch cancels the LLM call, avoiding wasted token generation:
const upstream = await fetch(LLM_URL, {
signal: req.signal, // forward the abort
...
})SSE buffering
Network chunks don't align with SSE event boundaries. A single reader.read() call may return half an event or three events concatenated. The parser buffers and splits correctly:
let buf = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buf += decoder.decode(value, { stream: true })
const parts = buf.split('\n\n')
buf = parts.pop() ?? '' // keep the incomplete tail
for (const part of parts) {
// process complete events
}
}The critical invariant: buf = parts.pop() always preserves the incomplete trailing event. Setting buf = '' inside the loop (a common mistake) silently drops buffered content mid-chunk.