RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation) is an architectural pattern in which a retrieval system fetches relevant documents or passages from an external knowledge base and injects them into the LLM's context window before the model generates a response. The retrieval step typically uses dense vector search (embedding the query and the corpus, then finding nearest neighbors) or sparse keyword search, or a hybrid of both. In AI-tutor systems for e-learning, the knowledge base is the course content — transcripts, slides, PDFs, and documentation — so the model answers from the actual course material rather than from its general training data. This directly reduces hallucination: when the model has the right passage in front of it, it is far less likely to invent an answer. RAG also makes the system's knowledge updatable without retraining — add new content to the index and the tutor knows it on the next query. A practical nuance is chunking strategy: how course content is split into indexed passages affects retrieval quality; chunks that are too small lose context, too large dilute relevance. RAG does not eliminate hallucination entirely — the model can still misread or ignore the retrieved passage — so human-reviewed response guardrails remain important. The quality of the retrieval step is often the binding constraint on system accuracy, making embedding model choice and index maintenance as important as the LLM itself.

RAG (Retrieval-Augmented Generation)

Related terms

AI tutor

LLM (Large Language Model)

AI hallucination