There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).
One way to normalize the vector is to apply some normalization to scale the vector to have a length of 1 i.e., a unit norm
. There are different ways to define “length” such as as l1 or l2-normalization. If you use l2-normalization, “unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1
.
(note this normalization is also often referred to as, unit norm
or a vector of length 1
or a unit vector
).
So given a matrix X
, where the rows
represent samples and the columns
represent features of the sample, you can apply l2-normalization
to normalize each row to a unit norm. This can be done easily in Python using sklearn.
Here’s how to l2-normalize vectors to a unit vector in Python
import numpy as np from sklearn import preprocessing # 2 samples, with 3 dimensions. # The 2 rows indicate 2 samples. # The 3 columns indicate 3 features for each sample. X = np.asarray([[-1,0,1], [0,1,2]], dtype=np.float) # Float is needed. # Before-normalization. print(X) # Output, # [[-1. 0. 1.] # [ 0. 1. 2.]] # l2-normalize the samples (rows). X_normalized = preprocessing.normalize(X, norm='l2') # After normalization. print(X_normalized) # Output, # [[-0.70710678 0. 0.70710678] # [ 0. 0.4472136 0.89442719]]
Now what did this do?
It normalized each sample (row) in the X matrix so that the squared elements sum to 1.
We can check that this is the case:
# Square all the elements/features. X_squared = X_normalized ** 2 print(X_squared) # Output, # [[ 0.5 0. 0.5] # [ 0. 0.2 0.8]] # Sum over the rows. X_sum_squared = np.sum(X_squared, axis=1) print(X_sum_squared) # Output, # [ 1. 1.] # Yay! Each row sums to 1 after being normalized.
As we see, if we square each element, and then sum along the rows, we get the expected value of “1” for each row.
How to l1-normalize vectors to a unit vector in Python
Now you might ask yourself, well that worked for L2 normalization. But what about L1 normalization?
In L2 normalization
we normalize each sample (row) so the squared elements sum to 1. While in L1 normalization
we normalize each sample (row) so the absolute value of each element sums to 1.
Let’s do another example for L1 normalization (where X
is the same as above)!
X_normalized_l1 = preprocessing.normalize(X, norm='l1') print(X_normalized_l1) # [[-0.5 0. 0.5] # [ 0. 0.3 0.67]]
Okay looks promising! Let’s do a quick sanity check.
# Absolute value of all elements/features. X_abs = np.abs(X_normalized_l1) print(X_abs) # [[0.5 0. 0.5] # [0 0.3 0.67]] # Sum over the rows. X_sum_abs = np.sum(X_abs, axis=1) print(X_sum_abs) # Output, # [ 1. 1.] # Yay! Each row sums to 1 after being normalized.
We can now see that taking the absolute value of each element, and then summing across each row, gives the expected value of “1” for each row.
The full code for this example is here.
More reading and references:
Official Python documentation
Official Python example
Can you please also explain the L1 calculation. I am a 75 year old guy learning AI just for fun and to be able to explain it to my grand daughters. When I see the math formula of L2 I could not make any sense of it but your example is crystal clear -and I thought is that all- why the heck they always come up with these complex formala’s instead of a simple example. Thank you for that.
Dear Hans van der Waal, I’m glad to hear that you found this helpful! I also have a hard time linking math equations to the often simple concepts. So these simple examples help clarify the ideas for me too.
I just added a section with an example for L1 normalization. Hope it helps!
Just wondering! why do we need to convert vectors to unit norm in ML? what is the reason behind this? Also, I was looking at an example of preprocessing in stock movement data-set and the author used normalizer(norm=’l2′). Any particular reason behind this? Does it have anything to do with the sparsity of the data? Sorry for too many questions.
Thanks for your questions Saurabh!
> why do we need to convert vectors to unit norm in ML?
We don’t have to. For some machine learning approaches (e.g., random forests), this may not be needed. The intuition for normalizing the vectors is that elements within the vector that have large magnitudes may not be more important, so normalizing them puts all elements roughly in the same scale.
> the author used normalizer(norm=’l2′). Any particular reason behind this? Does it have anything to do with the sparsity of the data?
Was this normalization put on the trainable weights during the training phase? L2 normalization penalizes weights that have a large magnitude. Whereas L1 encourages weights to be sparse (i.e., sets weights to be 0).
You can also preprocess the data using L2, which also penalizes large elements within the vector.
Hope that helps!