RAG.
Retrieval-Augmented Generation. Fetching relevant docs into the context before the model answers. Reduces hallucinations.
RAG (Retrieval-Augmented Generation) is the architecture where an LLM pulls relevant documents from an external knowledge base and adds them to its context before generating a response. It lets the model work with fresh or proprietary information without retraining. The classic RAG pipeline is: user question → embedding → vector DB search → top-k chunks → injected into the LLM prompt → answer. Modern variants (agentic RAG, hybrid retrieval, rerankers) turn the simple pipeline into something multi-step and smarter. In practice, 80% of quality comes from retrieval, only 20% from generation.
Retrieval-Augmented Generation. Modele cevap vermeden önce ilgili belgeleri çekip context'e koyma tekniği. Hallucination'ı azaltır.
RAG (Retrieval-Augmented Generation), bir LLM'in cevap üretmeden önce harici bir bilgi tabanından ilgili belgeleri çekip context'ine eklediği mimaridir. Modeli yeniden eğitmeden taze veya özel bilgiyle çalışmasını sağlar. Klasik RAG: kullanıcı sorusu → embedding → vector DB araması → top-k chunk → LLM prompt'una enjekte → cevap. Modern varyantlar (agentic RAG, hybrid retrieval, reranker) basit pipeline'ı çok adımlı ve daha akıllı hale getirir. Pratikte kalitenin %80'i retrieval kalitesinden, %20'si generation'dan gelir.
- embed docs → store in pgvector → top-k cosine on query → stuff into prompt
- use Cohere reranker on retrieved chunks for higher precision
Related terms.
03Embedding
AI / LLMTurning text (or images) into numeric vectors representing meaning. The foundation of similarity search and RAG.
Context Window
AI / LLMThe maximum number of tokens an LLM can "see" at once. When the window fills up, the model starts forgetting or the conversation gets compacted.
Hallucination
AI / LLMWhen a model confidently makes up information — nonexistent libraries, fake APIs, imaginary function signatures.