Retrieval-augmented generation, explained without jargon. Why your chatbot vendor keeps saying it, and what to actually evaluate.

RAG stands for retrieval-augmented generation. Vendors say it like it's a feature. It's actually just a sensible architecture for letting a language model answer questions about content the model wasn't trained on — like your website.

The whole pattern in one paragraph

Take a body of text (your website). Cut it into chunks. Convert each chunk to an embedding (a numeric vector that captures meaning). Store those. When a user asks a question, embed the question, find the most similar chunks, and stuff them into the prompt sent to the model. The model sees both the question and your relevant content, then writes the answer.

Why this matters to you

Hallucination drops dramatically when the model has the right source paragraph in front of it.
Updating the bot is a re-crawl, not a fine-tune. Cheap and fast.
You can cite the page the answer came from — visitors trust answers with sources.

What to actually evaluate

Chunk quality: is each chunk semantically coherent, or does it cut sentences in half?
Retrieval recall: when you ask a question whose answer is on page 4, does it actually surface that chunk?
Citation: does the bot show the source URL?
Update cadence: how does retraining work, and what does it cost?

What Is RAG? A Plain-English Guide for Marketers

The whole pattern in one paragraph

Why this matters to you

What to actually evaluate