Back to blog

GraphRAG Architecture Diagram: Knowledge Graph-Enhanced AI Systems (2026)

How to draw a GraphRAG architecture diagram. Covers the full pipeline — entity extraction, knowledge graph construction, community detection, global and local search — with prompt templates for generating accurate GraphRAG diagrams.

R
Ryan·Senior AI Engineer
·

A GraphRAG architecture diagram visualizes a Retrieval-Augmented Generation pipeline that uses a knowledge graph instead of a flat vector index to store and retrieve information. Introduced by Microsoft Research in 2024, GraphRAG addresses a fundamental weakness of standard vector RAG: the inability to answer questions that require synthesizing information across the entire corpus rather than retrieving a handful of similar chunks. By extracting entities and relationships into a structured graph and summarizing communities of related concepts, GraphRAG enables “global” queries that flat RAG cannot handle.

Diagramming a GraphRAG system is more complex than diagramming standard RAG — the pipeline has two distinct phases (indexing and querying), each with multiple stages, and the graph data model requires its own representation. This guide walks through every component, explains the two query modes, and provides prompt templates you can use to generate accurate GraphRAG architecture diagrams in seconds.

GraphRAG vs. standard RAG: the architectural difference

Standard RAG architecture works by chunking documents, embedding chunks into a vector database, and at query time retrieving the top-k chunks most similar to the question embedding. This works well for “local” queries — questions whose answers live in a small number of document sections. But for “global” queries — “What are the main themes across all these documents?” or “How do these 50 entities relate to each other?” — retrieving a few similar chunks is insufficient.

GraphRAG solves this by running an LLM-powered indexing pipeline over the entire corpus during ingest. Rather than just embedding text chunks, it extracts named entities (people, organizations, concepts, events), detects relationships between them, builds a knowledge graph, and then runs community detection to cluster related entities into hierarchical summaries. At query time, the system can answer global questions by reasoning over community summaries without touching the raw documents.

The GraphRAG indexing pipeline

The indexing pipeline is the most distinctive part of GraphRAG architecture. Your diagram should represent it as a sequence of processing stages, each transforming the data into progressively more structured form:

1. Document ingestion and chunking

Raw documents (PDFs, web pages, transcripts, codebases) are loaded and split into text chunks, exactly as in standard RAG. Chunks are typically larger in GraphRAG than in vector RAG — 1,000 to 2,400 tokens — because the LLM extraction step needs enough context to identify complete entity mentions and relationship statements. Chunk overlap is recommended to avoid splitting entity mentions across chunk boundaries.

2. Entity and relationship extraction

Each chunk is passed to an LLM with a structured extraction prompt. The LLM identifies:

  • Entities: Named entities with a type label (person, organization, technology, concept, event) and a description synthesized from the chunk
  • Relationships: Directed edges between entity pairs, with a description of the relationship and a confidence/weight score
  • Claims (optional): Covariate claims associated with entities — facts, assertions, or status information (e.g., “Company X acquired Company Y in 2025”)

This step is the most expensive part of the pipeline — every chunk requires one or more LLM calls. Production GraphRAG deployments often use a cheaper model for extraction and a more capable model for summarization.

3. Entity deduplication and merging

The same real-world entity often appears with different surface forms across chunks (“OpenAI”, “Open AI”, “the company behind ChatGPT”). A deduplication step merges entity records that refer to the same underlying entity, consolidating their descriptions and incoming/outgoing relationships. This is typically done with a combination of embedding similarity and LLM-based resolution. The output is a deduplicated entity table stored in a graph database or in-memory graph structure.

4. Knowledge graph construction

Deduplicated entities become graph nodes; extracted relationships become directed edges. The graph is stored in a format suitable for graph algorithms — common choices include in-memory NetworkX graphs (for smaller corpora), Neo4j or Apache AGE (for production deployments), or file-based Parquet tables (as used by Microsoft's reference implementation). The graph should be depicted in your diagram as the central data store that all downstream steps read from.

5. Community detection

A graph community detection algorithm (the reference implementation uses Leiden, which optimizes modularity) partitions the entity graph into hierarchical clusters of closely related entities. The result is a tree of communities at multiple granularity levels — large top-level communities (broad topics) subdivided into smaller sub-communities (specific themes). Each community is assigned a numeric level, and higher-level communities contain more entities and represent broader conceptual groupings.

6. Community report generation

For each community at each level, an LLM synthesizes a structured summary report covering: the community's main themes and entities, key findings, notable claims, and impact ratings. These reports are the unit of retrieval for global queries. They are stored alongside their community IDs and level metadata. Generating reports for all communities is a second major LLM-cost component of GraphRAG indexing.

7. Embedding and vector index construction

In parallel with the graph pipeline, entities, relationships, text chunks, and community reports are embedded into a vector index. This supports local queries that benefit from semantic similarity search. The vector store is a supplementary index alongside the graph — not a replacement for it.

The GraphRAG query pipeline

GraphRAG supports two fundamentally different query modes, which your diagram should represent as separate retrieval paths:

Global search

Global search answers questions that require reasoning across the entire corpus. The query is broadcast across all community reports at a specified level. Each report is scored for relevance to the query, and the top reports are used as context for a final LLM synthesis step that produces a comprehensive answer. Global search is expensive (many LLM calls) but capable of answering questions that flat RAG cannot — it scales to the size of the community hierarchy rather than to any single document.

Local search

Local search answers questions about specific entities or narrow topics. The query is used to retrieve relevant entities from the vector index, then the graph is traversed outward from those entities to collect related entities, relationships, text chunks, and community reports. The collected context is assembled within the LLM's token budget and passed to the LLM for answer generation. Local search is more efficient than global search and is appropriate for specific factual questions.

Prompt templates for GraphRAG architecture diagrams

Full GraphRAG pipeline

"GraphRAG indexing pipeline. Input: document corpus stored in Azure Blob Storage. Stage 1: document loader splits documents into 1,500-token chunks with 100-token overlap. Stage 2: entity extraction LLM (GPT-4o-mini) processes each chunk and outputs entities (name, type, description) and relationships (source entity, target entity, description, weight). Stage 3: entity deduplication merges aliases using embedding similarity. Stage 4: knowledge graph stored in Neo4j — entity nodes with type labels, relationship edges with weight scores. Stage 5: Leiden community detection partitions the graph into a 3-level hierarchy. Stage 6: community report generation LLM (GPT-4o) writes structured summaries for each community. Stage 7: all artifacts (chunks, entities, relationships, community reports) are embedded and stored in Azure AI Search. Query pipeline: incoming queries are classified as global (routed to community report map-reduce) or local (routed to entity-centric graph traversal + vector search). Final answer generation by GPT-4o."

GraphRAG vs. vector RAG side-by-side comparison

"Side-by-side comparison diagram of standard vector RAG vs GraphRAG. Left side — vector RAG: document chunks → embedding model → vector DB → similarity search → LLM answer. Right side — GraphRAG: document chunks → entity extraction LLM → knowledge graph + community detection → community reports → global query: map-reduce over reports / local query: graph traversal + vector search → LLM answer. Annotate that vector RAG handles local queries well, GraphRAG handles both local and global queries. Annotate the cost difference: GraphRAG requires 10-100x more LLM tokens during indexing."

Enterprise GraphRAG with hybrid retrieval

"Enterprise GraphRAG deployment on Azure. Indexing pipeline runs as an Azure Batch job triggered nightly. Document sources: SharePoint, Confluence, and a SQL database accessed via Azure Data Factory. Entity extraction and community summarization use Azure OpenAI GPT-4o. Knowledge graph stored in a Cosmos DB for Apache Gremlin instance. Vector index in Azure AI Search. Query service is a containerized FastAPI app in Azure Container Apps. Query routing: a classifier model determines global vs local vs hybrid query type. Hybrid queries combine community reports (global context) with entity graph traversal (local precision). Auth via Azure Entra ID; row-level security enforced at the graph query layer so users only see entities from documents they have access to."

GraphRAG component reference

ComponentRoleCommon implementation
Document loaderIngest and chunk raw documentsLlamaIndex, LangChain, custom
Extraction LLMExtract entities, relationships, claimsGPT-4o-mini, Claude Haiku (cost-optimized)
Knowledge graph storePersist entity nodes and relationship edgesNeo4j, Apache AGE, NetworkX, Parquet
Community detectionPartition graph into topic clustersLeiden algorithm (graspologic library)
Summarization LLMGenerate community report summariesGPT-4o, Claude Sonnet (quality-optimized)
Vector indexEmbed and index entities, chunks, reportsAzure AI Search, Qdrant, Pinecone
Query routerClassify queries as global, local, or hybridRule-based classifier or small LLM
Answer LLMSynthesize final response from retrieved contextGPT-4o, Claude Sonnet/Opus

When to use GraphRAG vs. standard RAG

GraphRAG's richer indexing pipeline comes at significant cost — both in LLM API spend during indexing and in operational complexity. It is not the right choice for every use case. Use this decision framework when designing your RAG pipeline architecture:

  • Choose standard vector RAG when queries are specific and factual (“What does this contract say about termination clauses?”), when the corpus is small or homogeneous, or when indexing latency and cost are primary constraints.
  • Choose GraphRAG when users ask thematic or global questions (“What are the main risk factors across these 500 earnings calls?”), when the corpus contains a rich entity ecosystem that benefits from relationship modeling, or when you need to surface connections between documents that share no common terms.
  • Choose hybrid GraphRAG when your user base asks both types of questions and you can afford the indexing cost. Hybrid architectures route to global or local retrieval based on query classification and combine both result sets for complex mixed queries.

Frequently asked questions about GraphRAG architecture

What is a GraphRAG architecture diagram?

A GraphRAG architecture diagram visualizes the full pipeline of a Graph-based Retrieval-Augmented Generation system — from document ingestion through entity extraction, knowledge graph construction, community detection, and report generation (the indexing phase) to query routing, global/local retrieval, and answer synthesis (the query phase). It is the primary documentation artifact for teams building RAG systems that need to answer global or thematic questions across a large document corpus.

How is GraphRAG different from standard RAG?

Standard RAG retrieves semantically similar text chunks from a vector index. GraphRAG augments this with a knowledge graph that captures entities and relationships across the entire corpus, and community summaries that distill thematic clusters. This lets GraphRAG answer global questions (“summarize the main themes”, “how do these entities relate?”) that standard RAG cannot, at the cost of a significantly more expensive and complex indexing pipeline. Most production systems use GraphRAG's local search mode for specific queries and global search for thematic queries.

What databases are used in GraphRAG architecture?

GraphRAG typically uses a combination of a graph database (Neo4j, Apache AGE, or Amazon Neptune) to store the entity-relationship knowledge graph, and a vector database (Azure AI Search, Qdrant, Pinecone, or similar) for embedding-based similarity search. The Microsoft reference implementation uses Parquet files for local development and Azure AI Search for production. Neo4j is the most common choice for teams who need rich Cypher query capabilities on top of the graph.

How much does GraphRAG indexing cost?

GraphRAG indexing is significantly more expensive than standard RAG because it requires LLM calls for entity extraction on every chunk and LLM calls for community report generation on every detected community. Microsoft's published benchmarks suggest GraphRAG indexing uses 10-100x more LLM tokens than standard RAG chunking and embedding for equivalent corpus sizes. Teams typically mitigate this by using smaller, cheaper models (GPT-4o-mini, Claude Haiku) for extraction and reserving larger models for community summarization, by processing only the most relevant document subset, and by caching the knowledge graph between indexing runs.

Related guides: RAG architecture diagrams, vector database architecture, LLM architecture diagrams, and RAG pipeline use cases.

Ready to try it yourself?

Start Creating - Free