RAG systems first retrieve relevant documents using embeddings or search APIs, then pass the results into an LLM. Merging retrieval with generation grounds responses in verifiable sources and reduces hallucination.
Architectures vary from lightweight embeddings with vector databases to custom search stacks built on open-source frameworks. Designing chunk sizes and ranking functions carefully ensures quality context for prompts.