Back to blog

AWS Bedrock Architecture Diagram: The Complete Visual Guide (2026)

How to draw an AWS Bedrock architecture diagram. Covers Bedrock Runtime, Knowledge Bases, Agents, Guardrails, and the most common enterprise AI deployment patterns — with prompt templates to generate diagrams in seconds.

R
Ryan·Senior AI Engineer
·

An AWS Bedrock architecture diagram shows how an enterprise GenAI application is built on top of Amazon Bedrock — the managed service that provides access to foundation models from Anthropic, Amazon, Meta, Mistral, Cohere, and others through a single unified API. Bedrock has become the default AI infrastructure layer for teams already operating on AWS: it eliminates the need to manage model infrastructure, provides native integrations with S3, Lambda, and IAM, and handles enterprise requirements like VPC endpoints, PrivateLink, CloudTrail logging, and data residency.

Diagramming your Bedrock setup is essential for architecture reviews, cost governance (different models have dramatically different per-token pricing), security audits (data never leaves your AWS account with Bedrock's private API), and for onboarding engineers who need to understand which model handles which workload and how the retrieval and agent layers fit together.

The core components of an AWS Bedrock architecture

Bedrock Runtime API

The Bedrock Runtime is the core inference layer. It exposes two primary operations: InvokeModel for synchronous, single-turn completions and InvokeModelWithResponseStream for streaming output. Every model on Bedrock shares the same API surface — your application code doesn't change when you swap Claude for Llama or Titan. In your diagram, show the Bedrock Runtime as the central API layer that your application services call, with the specific model IDs annotated (e.g., anthropic.claude-opus-4-8-20260801-v1:0 for the highest-capability tasks, anthropic.claude-haiku-4-5-20251001-v1:0 for high-volume classification).

Amazon Bedrock Knowledge Bases

Knowledge Bases for Amazon Bedrock is the managed RAG layer. It handles document ingestion, chunking, embedding (using Amazon Titan Embeddings or Cohere Embed), vector storage (OpenSearch Serverless, Aurora pgvector, or Pinecone), and retrieval — all without you managing the pipeline. When a user query arrives, Knowledge Bases retrieves the top-K relevant chunks from your vector store, injects them into the prompt as context, and invokes the foundation model. In your diagram, show Knowledge Bases as a component that receives user queries from your application, retrieves from the connected vector store, and passes augmented prompts to the Bedrock Runtime. Connect it to your S3 data sources and show the sync schedule for keeping the index current.

Agents for Amazon Bedrock

Agents for Amazon Bedrock implements the ReAct (Reason + Act) loop for agentic workflows. An agent receives a user goal, reasons about what actions to take, calls Lambda functions or Bedrock Knowledge Bases as tools, observes the results, and iterates until it produces a final response. You define the agent's capabilities through an action group — an OpenAPI schema that describes available tools mapped to Lambda function ARNs. Agents handle conversation memory natively through session state. Your diagram should show: the agent receiving user input from your app, the ReAct loop with the foundation model, the action group with each Lambda tool, and any Knowledge Bases the agent can query for context.

Guardrails for Amazon Bedrock

Guardrails applies content filtering, PII detection and redaction, grounding checks (hallucination detection), topic blocking, and word filtering across both input and output. Guardrails sits between your application and the Bedrock Runtime — apply it to every model invocation by passing a guardrailIdentifier in your API call. In your diagram, represent Guardrails as a bidirectional policy layer on the path between your application and the Bedrock Runtime, with annotations for which policies are active (content filter threshold, denied topics, PII types to redact).

Model evaluation and customization

For teams that need custom behavior, Bedrock supports fine-tuning (continued pre-training and instruction fine-tuning on Amazon Nova and Titan models) and model evaluation (automated metrics and human review across a test dataset). The fine-tuned model is stored as a custom model in your account and invoked through the same Runtime API. Your diagram should show custom models as a separate node in the model layer, connected to the training data in S3 and the evaluation job outputs in CloudWatch.

Networking and security

By default, Bedrock API calls traverse the public internet using HTTPS. For compliance-sensitive workloads, use a VPC endpoint (AWS PrivateLink) to route all Bedrock traffic through your VPC without touching the public internet. All Bedrock API calls are logged to CloudTrail, and model invocation logging (optional) writes input/output to CloudWatch Logs and S3. IAM policies control which principals can invoke which models — you can restrict teams to specific model families or deny invocations above a token threshold. Your architecture diagram must show: the VPC boundary if using PrivateLink, the IAM role used by your application services, and the CloudTrail/CloudWatch observability path.

Common AWS Bedrock architecture patterns

Pattern 1: Direct API integration for application teams

The simplest pattern: an application service (Lambda, ECS container, or EC2) calls the Bedrock Runtime directly with an IAM role. No Knowledge Bases, no agents — just raw model inference wrapped in your own prompt engineering logic. Use this for: summarization, classification, translation, code generation, and any task where your application already provides all necessary context in the prompt. Guardrails should always be applied even in this simple pattern.

Pattern 2: Managed RAG with Knowledge Bases

The most common enterprise pattern: a document corpus lives in S3 (PDFs, Word docs, HTML pages, Confluence exports), Knowledge Bases indexes it using Titan Embeddings into an OpenSearch Serverless collection, and your application calls the RetrieveAndGenerate API. No custom vector pipeline to manage, no chunking logic to tune. Best for internal knowledge bases, customer support copilots, and documentation assistants where the corpus updates weekly or less.

Pattern 3: Agentic workflows with Agents for Bedrock

For tasks that require multi-step reasoning and taking actions in external systems: an Agents for Bedrock agent receives a user request, reasons about a plan using Claude or Nova, calls Lambda tools (CRM lookup, inventory check, order creation), and synthesizes a final answer. The agent session handles conversation context automatically. Best for: customer service automation, IT helpdesk workflows, data retrieval from heterogeneous systems, and approval routing.

Pattern 4: Multi-model routing for cost and quality optimization

Large-scale deployments route different request types to different models based on complexity, latency, and cost requirements. A routing Lambda classifies incoming requests — simple queries go to Haiku or Nova Micro (cents per million tokens), complex analysis goes to Opus or Nova Pro (dollars per million tokens). The routing layer logs the classification decision and the final token cost to CloudWatch for FinOps visibility. Model fallback handles quota throttling.

Prompt templates for AWS Bedrock diagrams

Basic Bedrock RAG application

"A user submits a question through a React web app. An API Gateway endpoint routes the request to a Lambda function with an IAM execution role that has bedrock:InvokeModel and bedrock:Retrieve permissions. The Lambda calls Amazon Bedrock Knowledge Bases RetrieveAndGenerate API, which retrieves relevant chunks from an OpenSearch Serverless vector index (indexed from S3 documents using Titan Embeddings) and invokes Claude claude-sonnet-4-6 to synthesize a grounded answer. The response passes through Bedrock Guardrails (PII redaction enabled, content filter at medium threshold) before returning to the user. All API calls are logged to CloudTrail. The S3 data source syncs to the Knowledge Base nightly via an EventBridge rule."

Bedrock Agent for customer service automation

"A customer service portal routes support requests to an Agents for Amazon Bedrock agent using Claude Sonnet. The agent has an action group with three Lambda tools: GetOrderStatus (queries DynamoDB orders table), InitiateReturn (calls internal returns API), and EscalateToHuman (creates a Zendesk ticket). The agent also has access to a Bedrock Knowledge Base containing the product FAQ and return policy documents indexed from S3. Guardrails block PII from appearing in agent responses. Session state is maintained for multi-turn conversations. The agent invocation is triggered through API Gateway from the frontend; session IDs are stored in ElastiCache for conversation continuity. All invocations and tool calls log to CloudWatch Logs."

Multi-model cost-optimized architecture

"Incoming AI requests hit a classification Lambda that uses Claude Haiku to score complexity (low/medium/high) and intent type (summarization, code generation, analysis, chat). Low-complexity summarization goes to Amazon Nova Micro. Medium chat and summarization go to Claude claude-sonnet-4-6. Complex multi-step analysis and code generation go to Claude claude-opus-4-8. All paths go through Bedrock Guardrails before returning to the calling service. Token usage, model ID, latency, and estimated cost per request are emitted as CloudWatch metrics. A CloudWatch dashboard shows daily spend by model family. The classification Lambda and model invocations share an IAM role with fine-grained bedrock:InvokeModel conditions restricting each service to its allowed model set."

Private Bedrock deployment with VPC endpoint

"All Bedrock traffic from our application tier (ECS Fargate containers in private subnets) routes through a VPC Interface Endpoint for Amazon Bedrock — no public internet path. The ECS task execution role has bedrock:InvokeModel permission. A VPC endpoint policy restricts invocations to approved model IDs (Claude and Nova only). Security groups on the VPC endpoint only allow inbound HTTPS from the application security group. Bedrock model invocation logging is enabled: inputs and outputs write to a CloudWatch Log Group with a 90-day retention policy, and a copy writes to an S3 bucket in the logging account (via CloudWatch cross-account log delivery) for long-term compliance archive. A Service Control Policy at the organization level denies Bedrock API calls from any principal that isn't using the VPC endpoint."

AWS Bedrock component reference

ComponentWhat it doesKey diagram annotation
Bedrock RuntimeModel inference API (InvokeModel / streaming)Model ID + region
Knowledge BasesManaged RAG: ingest → embed → retrieve → generateData source (S3), vector store type, sync schedule
Agents for BedrockMulti-step ReAct agent with Lambda tool callingAction group name, Lambda ARNs, session state
GuardrailsContent filter, PII redaction, grounding checksApplied to: input / output / both
Model EvaluationAutomated + human evaluation of model outputsEval dataset in S3, metric type
Custom ModelsFine-tuned Nova/Titan models stored in your accountBase model ID, training data S3 path
VPC EndpointPrivateLink for Bedrock API — no public internetEndpoint policy (model allow-list)
Invocation LoggingInput/output log to CloudWatch + S3Retention policy, destination bucket

What a good Bedrock architecture diagram must show

  • Model selection: Label every model invocation with the specific model ID. Different models have 10–100× cost and capability differences — the model choice is a first-class architectural decision, not an implementation detail.
  • IAM trust boundaries: Show which IAM roles have bedrock:InvokeModel permission and which model IDs they're restricted to. Overly broad IAM on Bedrock can allow any service to invoke any model, creating unexpected cost exposure.
  • Guardrails placement: Make it explicit whether Guardrails is applied on the input, the output, or both — and which policies are active. This is the primary safety audit surface.
  • Data residency: Show the AWS region of each Bedrock component. Not all models are available in all regions, and data residency requirements determine which region you can use.
  • Observability path: Show where CloudTrail logs, model invocation logs, and CloudWatch metrics flow. This is required for cost attribution and compliance audits.
  • Knowledge Base sync: Show the data source (S3 bucket), the sync trigger (EventBridge schedule or manual), and the vector store. Stale indexes are a common source of incorrect AI responses.

Frequently asked questions about AWS Bedrock architecture

What is an AWS Bedrock architecture diagram?

An AWS Bedrock architecture diagram is an architecture diagram that shows how an application uses Amazon Bedrock to access foundation models and managed AI capabilities. It depicts the Bedrock Runtime API, Knowledge Bases, Agents, Guardrails, and the AWS services they integrate with (Lambda, S3, OpenSearch, IAM, CloudTrail, VPC endpoints). It is the standard documentation artifact for enterprise teams building GenAI applications on AWS.

What is the difference between Amazon Bedrock and SageMaker for AI architecture?

Amazon Bedrock provides access to pre-trained foundation models through a managed API — no infrastructure to provision or models to deploy. SageMaker is a full MLOps platform for training, deploying, and serving custom models you own. In practice: use Bedrock when you want to build applications on top of existing foundation models (Claude, Nova, Llama, Mistral); use SageMaker when you need to fine-tune models on your own data at scale, run custom training jobs, or serve open-source models that aren't available on Bedrock. Many architectures use both: Bedrock for general-purpose LLM calls, SageMaker for domain-specific models.

How do I diagram AWS Bedrock Agents?

For a Bedrock Agents diagram, show the agent as a reasoning loop: the user sends a goal → the agent invokes the foundation model with the goal and available action descriptions → the model returns a tool call → the agent executes the Lambda tool → the result is fed back to the model → the loop repeats until a final answer is produced. Connect action groups (Lambda functions) and any Knowledge Bases the agent can query as separate nodes. Annotate the IAM role the agent assumes when invoking Lambda. Show session state storage if you're persisting conversation history beyond a single turn.

Related guides: RAG architecture diagrams, AWS architecture diagrams, LLM architecture diagrams, and AI agent architecture diagrams.

Ready to try it yourself?

Start Creating - Free