RAG 📚🔍🤖
Learn how RAG stops AI from making things up by retrieving real, fresh context before generating answers — like giving a genius student the right textbook pages during an open book exam.
Day 91: Retrieval-Augmented Generation — Retrieve First, Hallucinate Never
Why Should I Care?
Ever asked ChatGPT something recent and it gave you a completely wrong answer with full confidence? That is called hallucination. RAG fixes that. It makes AI retrieve real facts first, then answer. No more guessing. No more making things up. This is the technology behind every smart AI assistant that actually knows what it is talking about.
Core Concept
RAG stands for Retrieval-Augmented Generation. Think of it as an open book exam. A standard AI is like a student in a closed book exam — brilliant, but only knows what it memorised in 2021. It guesses for anything newer. A RAG-powered AI is the same student but with the open book right in front of them. Before answering, it finds the right page. Then it writes the answer based on what it actually read. Informed answer, not a guess.
How It Works
RAG has three steps. Step 1 is Retrieve — search a knowledge base and pull the top 2 to 3 most relevant documents for the user query. Step 2 is Augment — inject those retrieved documents into the prompt as context alongside the user question. Step 3 is Generate — the LLM now answers using the real context, not just its memory. Precision beats quantity here. Only the top 2 most relevant docs are passed in. Too many docs flood the context and confuse the model.
documents = [
"RohithBuilds offers Python, AI and AI Agent Bootcamp courses.",
"Rohi AI is the AI tutor built into RohithBuilds platform.",
"RohithBuilds is a learning platform for Indian CS students."
]
def retrieve(query, docs, top_k=2):
query_words = set(query.lower().split())
scored = [(len(query_words & set(d.lower().split())), d)
for d in docs]
scored.sort(reverse=True)
return [d for _, d in scored[:top_k]]
def rag_answer(query):
# Step 1 - RETRIEVE
retrieved = retrieve(query, documents)
# Step 2 - AUGMENT
context = "\n".join(retrieved)
print(f"Retrieved: {retrieved[0][:60]}...")
print(f"Augmented prompt built")
# Step 3 - GENERATE
print(f"LLM now answers from real context — not memory")
rag_answer("What courses does RohithBuilds offer?")Real World Connection
When you use Swiggy customer support chat and the AI actually knows your order status — that is RAG. The AI retrieved your real order data before replying. When you ask a company chatbot about current pricing and it gives the right answer — RAG retrieved the latest pricing page. Without RAG, the AI would just guess based on old training data. PhonePe support bots, Amazon product Q and A, and any AI that knows things that changed after its training date — all powered by RAG under the hood.
Examples
-- Without RAG (Closed Book LLM):
User: "What courses does RohithBuilds offer?"
AI: "I think it might be... probably some programming courses?"
-- Hallucination: guessing from old memory
-- With RAG (Open Book LLM):
-- Step 1 RETRIEVE: vector_db.search("RohithBuilds courses")
-- Returns: "RohithBuilds offers Python, AI and AI Agent Bootcamp"
-- Step 2 AUGMENT:
-- prompt = f"Context: {relevant_docs}\nQuestion: {query}"
-- Step 3 GENERATE:
-- response = llm.generate(prompt)
-- Output: "Based on retrieved context: RohithBuilds offers
-- Python + AI and AI Agent Bootcamp courses."
-- Accurate. Grounded. Not a guess.Common Mistakes
Two mistakes beginners make with RAG. First — dumping the entire knowledge base into the prompt. That floods the model with irrelevant text and wastes tokens. Second — using RAG when you actually need fine-tuning. They solve completely different problems. Always know which one your situation actually needs.
-- WRONG: Dumping entire knowledge base into prompt
prompt = f"Context: {all_500_documents}\nQuestion: {query}"
-- Result: context flooded, model confused, tokens wasted
-- CORRECT: Retrieve only top 2-3 most relevant docs
retrieved = vector_db.search(query, top_k=2)
context = "\n".join(retrieved)
prompt = f"Context: {context}\nQuestion: {query}"
-- Tight, focused, accurate
-- WRONG: Using RAG instead of fine-tuning for behaviour change
# RAG cannot teach the model a new tone or skill
-- CORRECT: Know the difference
# RAG = inject fresh knowledge at query time (facts change)
# Fine-tune = bake new behaviour into weights (style changes)
# Rule: RAG updates knowledge. Fine-tuning updates behaviour.Mini Challenge
Mini Challenge
Build a mini RAG system in Python without any AI library. Create a list of 5 facts about your favourite app — like Zomato or PUBG. Write a simple retrieve function that scores each fact by how many words it shares with the user query. Take a query like "how does Zomato delivery work" and print the top 2 retrieved facts. Then print a fake augmented prompt showing Context plus Question. You just built RAG from scratch.
Quick Quiz
Q: What are the three steps of RAG in order?
A: Retrieve, Augment, Generate.
Q: Why does a standard LLM hallucinate on recent questions?
A: Because it only knows what it memorised during training and guesses for anything newer.
Q: What is the difference between RAG and fine-tuning?
A: RAG updates knowledge at query time. Fine-tuning updates the model behaviour permanently.
Bonus Knowledge
RAG is not just about preventing hallucination — it is also about privacy and cost. Instead of retraining an entire model on your company data (which costs crores), you just build a knowledge base and let RAG retrieve from it. Your data never needs to enter the model weights. This is why every startup building AI products today uses RAG as their first architecture choice. Tools like LangChain and LlamaIndex are entire frameworks built just to make building RAG pipelines easier. Learn RAG and you understand the backbone of modern AI product development.
Key Takeaways
Key Takeaways
- RAG stands for Retrieval-Augmented Generation — retrieve first, then generate.
- A standard LLM is a closed book exam student — brilliant but uninformed about recent events.
- RAG gives the LLM the open book — real, fresh context before every answer.
- The three steps are: Retrieve relevant docs, Augment the prompt, Generate the answer.
- Only pass top 2 to 3 docs — flooding the context with too many docs confuses the model.
- RAG updates knowledge at query time. Fine-tuning updates model behaviour. They are different tools.
- Every serious AI product — Swiggy support, Amazon Q and A, company chatbots — runs on RAG.
Continue Learning with Rohi
You've used your 3 free Rohi questions. Create a free account to continue learning.