There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).

One way to normalize the vector is to apply `l2-normalization`

to scale the vector to have a `unit norm`

. “Unit norm” essentially means that if we **squared** each element in the vector, and **summed** them, it would equal `1`

.

(note this normalization is also often referred to as, `unit norm`

or a `vector of length 1`

or a `unit vector`

)

So given a matrix `X`

, where the `rows`

represent samples and the `columns`

represent features of the sample, you can apply `l2-normalization`

to normalize each row to a unit norm. This can be done easily in Python using sklearn.

## Here’s how to l2-normalize vectors to a unit vector in Python

import numpy as np
from sklearn import preprocessing
# Two samples, with 3 dimensions.
# The 2 rows indicate 2 samples,
# and the 3 columns indicate 3 features for each sample.
X = np.asarray([[-1,0,1],
[0,1,2]], dtype=np.float) # Float is needed.
# Before-normalization.
print X
# Output,
# [[-1. 0. 1.]
# [ 0. 1. 2.]]
# l2-normalize the samples (rows).
X_normalized = preprocessing.normalize(X, norm='l2')
# After normalization.
print X_normalized
# Output,
# [[-0.70710678 0. 0.70710678]
# [ 0. 0.4472136 0.89442719]] |

import numpy as np
from sklearn import preprocessing# Two samples, with 3 dimensions.
# The 2 rows indicate 2 samples,
# and the 3 columns indicate 3 features for each sample.
X = np.asarray([[-1,0,1],
[0,1,2]], dtype=np.float) # Float is needed.# Before-normalization.
print X
# Output,
# [[-1. 0. 1.]
# [ 0. 1. 2.]]# l2-normalize the samples (rows).
X_normalized = preprocessing.normalize(X, norm='l2')# After normalization.
print X_normalized
# Output,
# [[-0.70710678 0. 0.70710678]
# [ 0. 0.4472136 0.89442719]]

Now what did this do?

Continue reading “How to normalize vectors to unit norm in Python”