k-Nearest Neighbors (k-NN) classifier – Supervised Learning

What is k-Nearest Neighbors (k-NN)?

k-Nearest Neighbors (k-NN) is one of the simplest and most intuitive supervised learning algorithms.

  • It’s used for classification (predicting categories) and regression (predicting continuous values).

  • The idea:

    1. Store all training data.

    2. To predict a new point, look at its k closest neighbors (using distance, usually Euclidean).

    3. For classification: take a majority vote of neighbors’ classes.

    4. For regression: take the average of neighbors’ values.

Example with Iris 🌸

  • Suppose k=3.

  • A new flower is measured: [5.1, 3.5, 1.4, 0.2].

  • The algorithm finds the 3 closest flowers in training data.

  • If 2 are Setosa and 1 is Versicolor, prediction = Setosa.

👉 k-NN is called a “lazy learner” because it doesn’t build a mathematical model; it just stores the training set and uses it when making predictions.

Choosing k

  • Small k (like 1) → very sensitive to noise (overfits).

  • Large k → smoother, but may miss details (underfits).

  • Usually odd k (3, 5, 7) is chosen to avoid ties.

Example: Classifying a New Flower with k-NN

Training Data (simplified)

Suppose we only have 6 flowers in our training set:

Sepal LengthSepal WidthPetal LengthPetal WidthSpecies (Label)
5.13.51.40.2Setosa (0)
4.93.01.40.2Setosa (0)
5.82.74.11.0Versicolor (1)
6.02.75.11.6Versicolor (1)
6.33.36.02.5Virginica (2)
5.82.75.11.9Virginica (2)

🌸 New Flower to Classify

Features = [5.7, 3.0, 4.2, 1.2]
(We don’t know the species, that’s what we want to predict.)

Step 1: Compute Distances

We use Euclidean distance:

 d(\text{point1}, \text{point2}) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + \dots}

For example, distance from new flower to the first Setosa:

 d = \sqrt{(5.7 - 5.1)^2 + (3.0 - 3.5)^2 + (4.2 - 1.4)^2 + (1.2 - 0.2)^2}

Simplifying step by step:

 = \sqrt{(0.6)^2 + (-0.5)^2 + (2.8)^2 + (1.0)^2}

 = \sqrt{0.36 + 0.25 + 7.84 + 1.00}

 = \sqrt{9.45} \approx 3.07

👉 Do this for all 6 training samples (I won’t calculate all here, but you get the idea).

Step 2: Find Nearest Neighbors

Suppose the 3 smallest distances (k=3) are to:

  • A Versicolor (distance ~0.45)

  • Another Versicolor (distance ~0.9)

  • A Virginica (distance ~1.3)

Step 3: Majority Voting

  • Versicolor = 2 votes

  • Virginica = 1 vote

  • Setosa = 0 votes

✅ Predicted Class = Versicolor (1)

Why Scaling Matters in k-NN

1. k-NN is based on distance

  • Prediction is made by finding the closest neighbors in feature space.

  • So, features with larger numeric ranges contribute more to the distance.

2. Example of Imbalance

Suppose we have two features to classify fruits:

  • Weight (in grams) → ranges from 100g to 1000g.

  • Color (encoded 0=green, 1=red).

Now compare two fruits:

  • Fruit A = [150, 0] (150g, green)

  • Fruit B = [900, 1] (900g, red)

  • New Fruit = [160, 1] (160g, red)

Distances:

  • To A:

     d_A = \sqrt{(160 - 150)^2 + (1 - 0)^2}

     = \sqrt{100 + 1}

     = \sqrt{101} \approx 10.05

    To B:

     d_B = \sqrt{(160 - 900)^2 + (1 - 1)^2}

     = \sqrt{547600}

     = 740.0

👉 The “color” difference (0 vs 1) barely matters compared to the huge “weight” difference.
Even though color is very important to classify fruits, it gets ignored.

3. Effect: Bias in Distance

  • Features with large scales dominate.

  • Features with small scales are ignored.

  • Model may perform badly because it uses the “wrong” feature importance.

Solution: Feature Scaling

Two common preprocessing techniques:

🔹 Normalization (Min-Max Scaling)

Rescales values into the range [0,1].

 x' = \frac{x - \min(x)}{\max(x) - \min(x)}

Example: If weight ranges from 100–1000, then

 x' = \frac{160 - 100}{900} = 0.067

Now both weight and color are in comparable ranges.

🔹 Standardization (Z-score scaling)

Centers features around 0 with standard deviation 1:

 x' = \frac{x - \mu}{\sigma}

Example: If petal length mean = 3.7 cm and std = 1.7, then

 x' = \frac{4.2 - 3.7}{1.7} = 0.29

After scaling, each feature contributes equally in distance calculations.

Visual Example (Iris)

Without scaling, suppose:

  • Sepal length (cm) ranges from 4–8.

  • Petal length (cm) ranges from 1–7.

Petal length has a larger spread, so distance is mostly determined by petal length.
After scaling, both features are equally important.

Summary

  • k-NN is distance-based, so scaling is critical.

  • Without scaling, large-valued features dominate.

  • Normalization and standardization put features on the same “footing.”

  • Always scale features when using k-NN, SVM, clustering, PCA, etc.

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# Training data (features)
X_train = np.array([
    [5.1, 3.5, 1.4, 0.2],
    [4.9, 3.0, 1.4, 0.2],
    [5.8, 2.7, 4.1, 1.0],
    [6.0, 2.7, 5.1, 1.6],
    [6.3, 3.3, 6.0, 2.5],
    [5.8, 2.7, 5.1, 1.9]
])
y_train = np.array([0, 0, 1, 1, 2, 2])  # Labels: 0=Setosa, 1=Versicolor, 2=Virginica

# New flower
X_new = np.array([[5.7, 3.0, 4.2, 1.2]])

# Train k-NN with k=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Prediction
prediction = knn.predict(X_new)
print("Predicted class:", prediction)

# Output: Predicted class: [1] → Versicolor

Normalization

import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Example data: weight (100–1000), color (0 or 1)
X = np.array([[150, 0], [900, 1], [160, 1]])

# Min-Max Normalization
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X)
print("Normalized:\n", X_norm)

# Standardization
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
print("Standardized:\n", X_std)

Search

Table of Contents

You may also like to read