3 phases

rag-standard

Standard RAG

Documents are chunked, embedded, and indexed twice — once as dense vectors, once as a BM25 keyword index. At query time both retrievers run in parallel, their results are merged, reranked by a cross-encoder, and the best chunks are handed to the LLM as grounded context.

Generate this pipeline Component docs

Ingestion

Retrieval

Generation

Interactive Architecture diagram

Diagnostics Dashboard

Stage-by-Stage Data Flow Explorer

Select a phase from the controller below, then click individual step nodes to view their technical role, inputs, outputs, and mockup diagnostics data stream.

Phase Summary:

Ingestion

Load documents, split them into overlapping chunks, embed each chunk, and store the vectors.

Click a Node to Inspect:

[ROLE]:Reads source documents from files, repositories, or URLs, parsing the binary content and encoding it into standard, clean UTF-8 text passages.

[TECH STACK]:PyPDF2 / Docx2txt / LangChain WebBaseLoader / PDFPlumber

[INPUT]:Raw binary data stream (PDF, DOCX, TXT, HTML, JSON)

[OUTPUT]:Normalized string representing document plaintext content with structure metadata

[RAW DATA STREAM]:

> INGEST_STREAM: "financial_report_2026.pdf" (Size: 2.4 MB)
> DECODING_META: { mime: "application/pdf", pages: 12 }
> READOUT: "Ragiment Corp Annual Report 2026. EBITDA grew 18% to $4.6M. Product lines expanded by..."

Best suited for

The production default — general question-answering over a medium-to-large document corpus.

Corpus

10K – 1M docs

Queries

Lookup · summary

Infra

Vector DB

Latency

~200–500 ms

Complexity

Low

A handful of well-understood stages — chunk, embed, retrieve, rerank, generate. The easiest pipeline to operate, debug, and reason about.

Relevance today

The most widely deployed RAG architecture and the right default for almost every project — the baseline everything else is measured against.

Where it's used

Documentation & support Q&A

Answer questions over product docs, manuals, and help centres with grounded citations.

Internal knowledge bases

Search Confluence, Notion, or SharePoint and return sourced answers instead of links.

Customer-facing chatbots

Ground every response in a known corpus, keeping the model from inventing facts.

Why it matters

Hybrid dense + BM25 retrieval catches both semantic matches and exact keywords, codes, and names.
Cross-encoder reranking lifts precision before the LLM sees context — fewer hallucinations.
Provider- and database-agnostic; scales from a laptop to millions of vectors.

Trade-offs & considerations

Struggles with multi-hop questions that connect facts across documents — reach for GraphRAG.
Requires a vector database to run, unlike the zero-infra PageIndex pipeline.
Chunking strategy materially affects quality — tune chunk size and overlap for your documents.

Alternatives to consider

When Standard RAG isn't the right fit, reach for one of these instead.

rag-graphGraphRAG

When answers require connecting facts across documents (multi-hop, relational).

rag-vectorlessPageIndex RAG

When you can't run a vector database, or the corpus is small enough for BM25.

More architectures

Explore the other pipelines

View all

rag-graphGraphRAGKnowledge-graph retrieval for multi-hop reasoning.Walk through

rag-agenticAgentic RAGA ReAct agent decides when — and how — to retrieve.Walk through

rag-vectorlessPageIndex RAGZero vector DB. BM25 + pickle persistence.Walk through

rag-wikiLLM WikiCompounding knowledge — the corpus becomes a wiki.Walk through

rag-multimodalMulti-modal RAGText + images retrieved together. Vision-grounded answers.Walk through