DAY 93

Semantic Search 🔍🧠⚡

Learn how semantic search understands the meaning behind your words — not just exact letters — using vectors and cosine similarity to return results that actually make sense.

⏱ 15 mins
⚡ +50 XP
Semantic Search 🔍🧠⚡

Day 93: Semantic Search — Find What You Mean, Not What You Type

Why Should I Care?

You type "good food near me" on Zomato. It finds "best restaurants nearby" — not because the words match, but because it understood what you meant. That is semantic search. Normal search fails you. Semantic search never does.

Core Concept

Semantic search finds results based on meaning — not exact words. Think of a wise librarian. You say "people who changed the world despite impossible odds." She does not search for those exact words. She hands you books on Nelson Mandela, Bhagat Singh, Marie Curie. That is semantic search. A regular search bar? It just checks if the exact letters match. Zero results found. Every time.

How It Works

Every word, sentence, or document gets converted into a list of numbers called a vector. Think of it as giving every sentence a GPS location in meaning space. Sentences with similar meanings get GPS points that are close together. Your search query also becomes a vector. The system then finds which document vectors are closest to your query vector. That closeness is measured using cosine similarity — the smaller the angle between two vectors, the more similar they are. High similarity score means very relevant. Low score means not related at all.

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

corpus = {
    "Python programming tutorial"   : np.array([0.9, 0.2, 0.1]),
    "Learn to code from scratch"    : np.array([0.8, 0.3, 0.2]),
    "SQL database management"       : np.array([0.3, 0.2, 0.9]),
    "AI and machine learning basics": np.array([0.7, 0.9, 0.9]),
    "Cooking recipes for beginners" : np.array([0.1, 0.1, 0.1])
}

query = np.array([0.85, 0.25, 0.15])

scored = [(cosine_similarity(query, vec), title)
          for title, vec in corpus.items()]
scored.sort(reverse=True)

print("-- Semantic Search Results --")
for score, title in scored[:3]:
    print(f"{score:.4f} -- {title}")

Real World Connection

YouTube uses semantic search when it recommends videos similar to what you just watched — even if the titles are completely different. When you search "how to earn money online" on YouTube, it also shows videos titled "passive income ideas" and "freelancing tips" — because the meaning is the same. Netflix does this too when it recommends shows. Google Search does this. Even ChatGPT uses a version of this called RAG — Retrieve, Augment, Generate — where it fetches the most meaning-similar documents before answering you.

Examples

-- Query: "learn coding"

-- Keyword Search (SQL LIKE) result:
SELECT * FROM courses WHERE title LIKE '%learn coding%';
-- Returns: 0 results (no exact match found)

-- Semantic Search result:
-- Converts query to vector, finds closest meanings
-- Returns:
-- 0.9956 -> Python programming tutorial    (very close)
-- 0.9891 -> Learn to code from scratch     (close)
-- 0.8732 -> AI and machine learning basics (relevant)
-- 0.2100 -> Cooking recipes for beginners  (far away)

-- Winner: Semantic search finds what you MEANT, not what you TYPED

Common Mistakes

Two big mistakes beginners make with semantic search. First — using SQL LIKE instead of embeddings. Second — using different embedding models for queries and documents. Both will completely break your results.

-- WRONG: Using SQL LIKE for semantic search
SELECT * FROM courses WHERE content LIKE '%python%';
-- Misses: "coding tutorial", "programming", "software engineering"

-- CORRECT: Convert query and docs to embeddings, use cosine similarity
query_vector = embed_model.encode("learn coding")
results = find_top_k_similar(query_vector, document_vectors)

-- WRONG: Mixing embedding models
-- Query embedded with Model A
-- Documents embedded with Model B
-- Result: similarity scores are completely meaningless

-- CORRECT: Always use the same embedding model for BOTH
-- queries and documents — always, no exceptions
query_vector = same_model.encode(query)
doc_vectors  = [same_model.encode(doc) for doc in documents]

Mini Challenge

Mini Challenge

Create a tiny semantic search in Python. Make a list of 5 sentences (like food items on a Zomato menu). Represent each as a simple numpy array (make up the numbers). Write a query array. Use cosine similarity to find the top 2 most similar items to your query. Print the results with their scores. Can you find the closest match?

Quick Quiz

Q: What does semantic search find results based on?
A: Meaning — not exact word matching.

Q: What is a vector in the context of semantic search?
A: A list of numbers that represents the meaning of a sentence in space.

Q: What is cosine similarity used for?
A: To measure how close two vectors are — a small angle means high similarity, a large angle means low similarity.

Bonus Knowledge

RAG stands for Retrieve, Augment, Generate. It is the superpower behind ChatGPT and AI assistants. When you ask a question, the system first retrieves the most semantically similar documents from a knowledge base. Then it augments your prompt with those documents. Then it generates an answer. Without semantic search, RAG cannot find the right documents. That is why semantic search is the heartbeat of modern AI apps. Vector databases like Pinecone, Weaviate, and Chroma are built entirely around storing and searching these vectors at massive scale.

Key Takeaways

Key Takeaways

  • Semantic search finds results by meaning, not by matching exact letters or words.
  • Every sentence is converted into a vector — a list of numbers representing its meaning.
  • Cosine similarity measures how close two vectors are — smaller angle means more similar.
  • SQL LIKE only matches exact text — it misses synonyms, intent, and related concepts.
  • Always use the same embedding model for both your query and your documents.
  • RAG (Retrieve, Augment, Generate) powers AI assistants and relies on semantic search.
  • Apps like YouTube, Netflix, Google, and Zomato all use semantic search under the hood.

← Previous Lesson