What does the L2 or Euclidean norm mean?

Last updated on May 15th, 2017

Here’s a quick tutorial on the L2 or Euclidean norm.

First of all, the terminology is not clear. So let’s start with that.

Many equivalent names

All these names mean the same thing:
Euclidean norm == Euclidean length == L2 norm == L2 distance == l^2 norm

Although they are often used interchangable, we will use the phrase “L2 norm” here.

Many equivalent symbols

Now also note that the symbol for the L2 norm is not always the same.

Let’s say we have a vector, \vec{a} = [3,1,4,3,1].

The L2 norm is sometimes represented like this, ||\vec{a}||

Or sometimes this, ||\vec{a}||_2

Other times the L2 norm is represented like this, |\vec{a}|

Or even this, |\vec{a}|_2

To help distinguish from the absolute value sign, we will use the ||\vec{a}|| symbol.


Now that we have the names and terminology out of the way, let’s look at the typical equations.
||\vec{a}|| = \sqrt{\sum_i^n (a_i)^2} = \sqrt{(a_1)^2 + (a_2)^2 + \dots + (a_n)^2}
where n is the number of elements in \vec{a} (in this case n=5).

In words, the L2 norm is defined as, 1) square all the elements in the vector together; 2) sum these squared values; and, 3) take the square root of this sum.

A quick example

Let’s use our simple example from earlier, \vec{a} = [3,1,4,3,1].

We compute the L2 norm of the vector \vec{a} as,
||\vec{a}|| = \sqrt{(3^2 + 1^2 + 4^2 + 3^2 + 1^2)} = \sqrt{9 + 1 + 16 + 9 + 1} = \sqrt{36} = 6

And there you go!

So in summary, 1) the terminology is a bit confusing since as there are equivalent names, and 2) the symbols are overloaded. Finally, 3) we did a small example computing the L2 norm of a vector by hand.

If you are hungry for a code example, I wrote a small MATLAB example (computing L2 distance) here.

2 thoughts on “What does the L2 or Euclidean norm mean?”

    1. Hi nagdawi84, it’s not clear when to use different normalizations. In a machine learning scenario, an unsatisfying but practical answer is to try a few different normalizations, and choose the one that performs the best on your validation set.

      You may also want to try normalization if you’re combining different sources of data with vastly different range of scales. For example, say you are combining a person’s age with visual features, where the person’s age can range from 0-100, and the visual features range from 0-1. If the scale of these types of data vastly differs, normalizing may help with learning (e.g., training an SVM).

      Hope that helps!

Questions/comments? If you just want to say thanks, consider sharing this article or following me on Twitter!