Back to blog

WebSocket Architecture Diagram: Real-Time System Design Patterns (2026)

How to design and diagram WebSocket architectures for real-time apps. Covers connection management, pub/sub fanout, horizontal scaling with Redis, Server-Sent Events, and long-polling fallback — with AI prompt templates.

R
Ryan·Senior AI Engineer
·

WebSocket architecture diagrams visualize real-time communication systems where a persistent, bidirectional connection between client and server replaces the traditional request/response cycle. Real-time features — live collaboration, chat, push notifications, streaming AI responses, live dashboards, and multiplayer games — all require architectural decisions that standard HTTP diagrams don't capture: how connections are managed at scale, how messages are fanned out to multiple clients, how stateful connections survive server restarts, and how the system degrades gracefully under load. A clear WebSocket architecture diagram makes these design decisions visible and reviewable.

Core components of a WebSocket architecture

  • WebSocket gateway / connection manager: The layer that accepts and maintains WebSocket connections — a dedicated WebSocket server (Socket.io, ws), a managed service (AWS API Gateway WebSocket, Ably, Pusher), or a general-purpose reverse proxy (nginx, Envoy) configured for WebSocket proxying
  • Connection registry: The store that maps connection IDs to user/session identifiers — typically Redis or DynamoDB — enabling the application tier to look up which connections belong to a given user or room
  • Pub/sub layer: The mechanism for broadcasting messages to multiple connected clients — Redis Pub/Sub, Kafka, or a dedicated channel service — essential when WebSocket servers are horizontally scaled
  • Application / business logic tier: REST or gRPC services that process actions from WebSocket clients, mutate state, and publish events back through the pub/sub layer to the WebSocket gateway
  • Authentication and authorization: How the initial WebSocket handshake is authenticated (JWT in query param or cookie, token exchange), and how fine-grained authorization is enforced for room or channel access
  • Presence and heartbeat tracking: How the system detects disconnected clients (server-side ping/pong or client heartbeats), cleans up connection state, and broadcasts presence events to other users
  • Fallback transport: Server-Sent Events (SSE) for server-to-client-only flows, long-polling for environments where WebSockets are blocked, or a transport abstraction library (Socket.io) that negotiates the best available transport

Prompt examples for common real-time patterns

Live collaborative document editor

"Users open a shared document in a React frontend. On connect, the client sends a JWT (stored in an HTTP-only cookie) to a Socket.io server cluster (3 nodes behind an AWS NLB with sticky sessions). The server validates the JWT, looks up the document room, and adds the connection ID → user ID mapping to Redis. Document edits are sent as operational transform (OT) deltas over WebSocket. The server applies the delta, persists it to PostgreSQL, and broadcasts to all connections in the room via Redis Pub/Sub (the io.in(room).emit() pattern). Presence events (user joined/left) are also broadcast. Disconnects trigger a cleanup job that removes the connection from Redis and broadcasts a 'user left' event after a 5-second grace period for reconnects."

Streaming LLM responses (AI chat interface)

"A Next.js frontend sends a chat message to a FastAPI backend via HTTP POST. The backend calls the OpenAI Responses API with streaming enabled and uses Server-Sent Events (SSE) to stream each token chunk back to the client. The client renders tokens incrementally as they arrive. The complete response is stored in PostgreSQL with a conversation_id. For concurrent users, each SSE stream is a separate HTTP connection — no shared state is needed. Rate limiting is enforced at the API Gateway layer (10 requests per minute per user). The frontend shows a typing indicator while the stream is active and cancels the stream if the user navigates away."

Real-time multiplayer game (AWS API Gateway WebSocket)

"Players connect via AWS API Gateway WebSocket (manages connections natively). On $connect, a Lambda authorizer validates the player's session token and a Lambda stores the connection ID and player ID in DynamoDB. Game actions (move, shoot, chat) are routed to dedicated Lambda functions via the $default route. The game state Lambda reads/writes from a DynamoDB game-state table and uses the @connections API to push updates to all players in the room by querying DynamoDB for their connection IDs. Disconnects are handled by the $disconnect Lambda which removes the connection from DynamoDB and broadcasts a player-left event. CloudWatch monitors connection count and Lambda p99 latency."

Live dashboard with push updates

"An operations dashboard in React subscribes to real-time metric updates. The backend uses Server-Sent Events (SSE) — not WebSocket — since the data flow is server-to-client only. A Node.js Express server maintains an in-memory registry of SSE connections keyed by user ID. A separate metrics collector service polls Datadog and Prometheus APIs every 10 seconds, computes diffs, and publishes changed metrics to Redis Pub/Sub. A Redis subscriber in the Express server receives the diffs and writes them to all active SSE connections. The SSE endpoint requires Bearer token authentication. Nginx is configured with proxy_buffering off and keepalive_timeout 300s to support long-lived SSE connections."

Real-time transport comparison

TransportDirectionBest forLimitations
WebSocketBidirectionalChat, collaboration, multiplayer gamesComplex horizontal scaling, stateful
Server-Sent Events (SSE)Server → Client onlyLive feeds, notifications, streaming AINo client-to-server channel
Long pollingSimulated pushFallback for restrictive networksHigher latency, more overhead
WebRTCPeer-to-peerVideo/audio calls, P2P file transferRequires STUN/TURN, complex signaling
HTTP/2 PushServer → Client (proactive)CDN resource push, limited use casesDeprecated in HTTP/3, browser support mixed

Horizontal scaling challenges to show in your diagram

WebSocket architectures require explicit thought about horizontal scaling because connections are stateful. Your diagram should show how you solve each of these:

  • Sticky sessions: Load balancers must route the same client to the same server, or you need a shared connection registry — show whether your load balancer uses IP hashing, cookie-based affinity, or consistent hashing
  • Cross-node message fanout: When clients connected to different servers must receive the same message, show the pub/sub layer (Redis, Kafka) that enables the fanout across nodes
  • Connection registry: Show where active connection IDs are stored (Redis, DynamoDB) so the application tier can look up which connections to push to
  • Graceful shutdown: Show how in-flight connections are drained during deploys — whether clients reconnect to a new server, and how in-progress state is preserved

Related guides: streaming data architecture, API gateway architecture, microservice patterns, and SaaS architecture diagrams.

Ready to try it yourself?

Start Creating - Free