DAY 64

Regression Explained 📈🏠🔢

Give it inputs. It gives you a number. That's regression. Predict house prices, exam scores, temperatures — any continuous number!

⏱ 15 mins
⚡ +50 XP
Regression Explained 📈🏠🔢

Day 64: Regression — Machines That Predict Numbers!

Why Should I Care?

The world is full of questions with numerical answers. What will this house cost? How long will this delivery take? What score will I get if I study 9 hours? These are not yes/no questions. They need a precise number as the answer. Regression is how machines answer them — with data, not guesswork. Give it features, get back an exact number every time!

Regression vs Classification

Regression outputs a number — 45.7, 82.3, 102.3. House price, exam score, temperature in degrees. Classification outputs a category — Pass or Fail, Yes or No, Spam or Not Spam. Different output types need different models. If your answer is a continuous number, use regression. If your answer is a category, use classification (coming tomorrow)!

How Regression Works — The Best Fit Line

Regression draws a line through all your training data points — the best fit line. This line captures the pattern between your features and your labels. When you give it a new input like 9 hours studied, it follows the line to the corresponding score value and returns exactly 102.3. The line is the pattern it learned. Every prediction follows that line!

Your First Regression Model


from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

hours  = np.array([[1],[2],[3],[4],[5],[6],[7],[8]])
scores = np.array([45, 55, 62, 71, 80, 85, 91, 97])

X_train, X_test, y_train, y_test = train_test_split(
    hours, scores, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

predicted = model.predict([[9]])
print(f"Predicted score for 9 hours: {predicted[0]:.1f}")

Output: Predicted score for 9 hours: 102.3. A precise number, not a category. The model learned the relationship between hours studied and scores from training data. It never saw 9 hours during training. It predicted 102.3 based on the pattern it found. That is regression!

Real World Regression Examples

House price prediction — features: location, size, floor, age, amenities. Label: price in lakhs. Not "expensive" or "cheap" — exactly 45 Lakhs. Zomato delivery time — features: distance, restaurant prep time, traffic, time of day. Label: delivery in minutes. Not "fast" or "slow" — exactly 32 minutes. IPL player auction price — features: batting average, strike rate, wickets. Label: price in crores. Regression answers numerical questions with data, not guesswork!

Common Mistakes

Mistake 1 — Using regression for yes/no or category predictions.


# WRONG — regression cannot cleanly predict categories!
y = ["Pass", "Fail", "Pass", "Fail"]
model = LinearRegression()    # use Classification instead!

# CORRECT — regression for continuous numbers only!
y = [45, 55, 65, 75]
model = LinearRegression()

Mistake 2 — Passing a single value to predict instead of a 2D array.


model.predict(9)      # WRONG — shape error, sklearn needs 2D array!
model.predict([[9]])  # CORRECT — double brackets give 2D array!

Mini Challenge

Mini Challenge

Create a dataset of 10 apartments — size in square feet and monthly rent in rupees. Split 80/20 with random_state=42. Train a LinearRegression model. Predict the rent for a 900 sq ft apartment. Print the prediction as a clean number. Then predict for 1500 sq ft. Do the predictions make real-world sense compared to your training data? You just built the same rent prediction model that Housing.com and NoBroker use to estimate property values!

Quick Quiz

Q: When should you use regression instead of classification? A: When the output is a continuous number — price, temperature, score, time — not a category!

Q: What is the best fit line in regression? A: The line drawn through all training data points that best captures the pattern between features and labels!

Q: Why use model.predict([[9]]) instead of model.predict(9)? A: sklearn always expects a 2D array — single values cause a shape error. Double brackets give 2D!

Key Takeaways

Key Takeaways

  • Regression predicts a continuous number — price, score, temperature, time.
  • Regression draws a best fit line through training data and follows it for predictions.
  • Use regression for numbers. Use classification for categories. Never mix them up.
  • Always pass a 2D array to model.predict() — use double brackets [[value]].
  • Regression is how machines answer numerical questions with data, not guesswork!

← Previous Lesson