Last updated on November 10th, 2017
There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).
One way to normalize the vector is to apply
l2-normalization to scale the vector to have a
unit norm. “Unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal
(note this normalization is also often referred to as,
unit norm or a
vector of length 1 or a
So given a matrix
X, where the
rows represent samples and the
columns represent features of the sample, you can apply
l2-normalization to normalize each row to a unit norm. This can be done easily in Python using sklearn.
Here’s how to l2-normalize vectors to a unit vector in Python
import numpy as np from sklearn import preprocessing # Two samples, with 3 dimensions. # The 2 rows indicate 2 samples, # and the 3 columns indicate 3 features for each sample. X = np.asarray([[-1,0,1], [0,1,2]], dtype=np.float) # Float is needed. # Before-normalization. print X # Output, # [[-1. 0. 1.] # [ 0. 1. 2.]] # l2-normalize the samples (rows). X_normalized = preprocessing.normalize(X, norm='l2') # After normalization. print X_normalized # Output, # [[-0.70710678 0. 0.70710678] # [ 0. 0.4472136 0.89442719]]
Now what did this do?
It normalized each sample (row) in the X matrix so that the squared elements sum to 1.
We can check that this is the case:
# Square all the elements/features. X_squared = X_normalized ** 2 print X_squared # Output, # [[ 0.5 0. 0.5] # [ 0. 0.2 0.8]] # Sum over the rows. X_sum_squared = np.sum(X_squared, axis=1) print X_sum_squared # Output, # [ 1. 1.] # Yay! Each row sums to 1 after being normalized.
As we see, if we square each element, and then sum along the rows, we get the expected value of “1” for each row.