DAY 90

AI Chatbot 🤖💬🚀

Put every AI and LLM concept from Days 81-89 to work by building a real multi-turn AI chatbot powered by LLaMA 3 via Groq API — with memory, personality, token tracking, and a context window guard.

⏱ 15 mins
⚡ +50 XP
AI Chatbot 🤖💬🚀

Day 90: AI Chatbot — Build a Real Thinking Chatbot with LLaMA 3

Why Should I Care?

You spent 9 days learning how AI works — transformers, attention, embeddings, tokens, context windows, prompt engineering. Today all of that becomes one real product. A chatbot that thinks, remembers your conversation, stays in character, and tracks its own cost. This is the exact same technology powering ChatGPT, Claude, and Gemini. You are not watching it from outside anymore. You are building it.

Core Concept

An AI chatbot is a loop. Every turn, you send the full conversation history plus the new message to the LLM. The model reads everything and replies. That reply gets added to history. Next turn — repeat. Memory is not magic. It is just appending messages to a list and sending the whole list every time. The system prompt at the top of that list gives the bot its personality. It runs before every single message. That is what makes Rohi sound like Rohi and not like a random AI.

How It Works

Five pieces work together. First — the system prompt defines personality and runs every turn. Second — conversation history stores every user and assistant message in order. Third — the Groq API sends that history to LLaMA 3 and gets a reply back. Fourth — the token estimator tracks how much context is being used after every turn. Fifth — the context window guard drops the oldest non-system message when the conversation gets too long. All five together equal a production-grade chatbot.

from groq import Groq

client = Groq()
MAX_TURNS = 10

conversation = [
    {"role": "system",
     "content": "You are Rohi, AI tutor by RohithBuilds. "
                "Explain concepts simply with fun real-world examples."}
]

def estimate_tokens(messages):
    return sum(len(m["content"].split()) * 1.3 for m in messages)

def chat(user_input):
    conversation.append({"role": "user", "content": user_input})

    # Context window guard — drop oldest non-system message
    if len(conversation) > MAX_TURNS + 1:
        conversation.pop(1)

    response = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=conversation
    )
    reply = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": reply})

    tokens = int(estimate_tokens(conversation))
    print(f"Rohi: {reply}")
    print(f"Context: {len(conversation)-1} turns | ~{tokens} tokens")

print("RohithBuilds — Rohi AI Chatbot")
print("Type ''quit'' to exit\n")

while True:
    user = input("You: ")
    if user.lower() == "quit":
        print("Rohi: Keep building! See you tomorrow.")
        break
    chat(user)

Real World Connection

Every time you chat with Swiggy support AI, it sends your full conversation history to the model on every message — just like this. When the Rohi AI tutor on RohithBuilds remembers what you asked two messages ago, it is because the entire history is in the API call. WhatsApp is building AI assistants. PhonePe has a support bot. Zomato has one too. Every single one of them is this exact loop — system prompt, conversation history, API call, append reply, repeat. You now know exactly how it works inside.

Examples

-- Real chatbot session output:

You: What is a function in Python?

Rohi: A function is like a coffee machine —
      press a button (call it), it runs, gives you output.
      def greet(name): return f"Hello {name}!"
Context: 2 turns | ~87 tokens

You: Can you give me an example?

Rohi: def add(a, b): return a + b
      print(add(3, 4)) outputs 7
      Think of it as a reusable recipe.
Context: 4 turns | ~143 tokens

-- What is happening behind the scenes each turn:
-- conversation = [system, user1, assistant1, user2, assistant2]
-- Entire list sent to LLaMA 3 every single time
-- Model reads full history -- replies with context
-- Token count grows -- guard drops oldest when limit hit

Common Mistakes

Two mistakes beginners make when building chatbots. First — sending only the latest message to the API instead of the full history. The model forgets everything instantly and gives disconnected replies. Second — never dropping old messages, letting the context overflow and crash the API call. The window guard is not optional.

-- WRONG: Sending only the latest message each turn
response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[{"role": "user", "content": user_input}]
)
-- Result: model has zero memory, every reply is disconnected

-- CORRECT: Always send full conversation history
response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=conversation   # full list every time
)

-- WRONG: Never trimming conversation history
conversation.append({"role": "user", "content": user_input})
# grows forever -- eventually crashes with context overflow error

-- CORRECT: Drop oldest non-system message when limit hit
if len(conversation) > MAX_TURNS + 1:
    conversation.pop(1)   # index 0 is system -- never touch it

Mini Challenge

Mini Challenge

Build your own version of Rohi using the code above. Change the system prompt to give the bot a different personality — a cricket coach, a Zomato food expert, or a PUBG strategy guide. Run a 5-turn conversation and watch the token count grow each turn. Then set MAX_TURNS to 3 and watch the context guard kick in and drop old messages. Add one print statement that shows the full conversation list after each turn so you can see the memory in action.

Quick Quiz

Q: Why must you send the full conversation history to the API on every turn?
A: Because LLMs have no memory between calls — the history list is the memory.

Q: What does the system prompt do in a chatbot?
A: It defines the bot''s personality and runs before every single conversation turn.

Q: What happens when conversation history exceeds the context window?
A: The API call crashes — so a context guard drops the oldest non-system messages to keep it within limit.

Bonus Knowledge

Phase 6 is now complete. In 10 days you went from asking "what is AI?" to building a real LLM-powered chatbot from scratch. You now understand what lives inside every AI product — embeddings on Day 83, transformers on Day 84, attention on Day 85, how ChatGPT works on Day 86, prompt engineering on Day 87, context windows on Day 88, tokens on Day 89, and today you assembled it all. The model is just an API call now. You understand everything underneath it. Phase 7 starts next — RAG, AI Agents, and Vector Databases. The serious stuff begins.

Key Takeaways

Key Takeaways

  • A chatbot is a loop — send full history plus new message, get reply, append, repeat.
  • Memory is just a list of messages sent to the API on every single turn.
  • The system prompt defines personality and runs before every conversation turn.
  • Always send the complete conversation history — not just the latest message.
  • A context window guard drops the oldest non-system message when history gets too long.
  • Token estimation lets you track cost and context usage in real time after every reply.
  • Every concept from Days 81 to 89 is alive and working inside this one chatbot.

← Previous Lesson