Last updated on May 15th, 2017

Here’s a quick tutorial on the L2 or Euclidean norm.

First of all, the terminology is not clear. So let’s start with that.

### Many equivalent names

All these names mean the same thing:

`Euclidean norm == Euclidean length == L2 norm == L2 distance == norm`

Although they are often used interchangable, we will use the phrase “`L2 norm`

” here.

### Many equivalent symbols

Now also note that the symbol for the `L2 norm`

is not always the same.

Let’s say we have a vector, .

The L2 norm is sometimes represented like this,

Or sometimes this,

Other times the L2 norm is represented like this,

Or even this,

To help distinguish from the absolute value sign, we will use the symbol.

### Equation

Now that we have the names and terminology out of the way, let’s look at the typical equations.

where is the number of elements in (in this case ).

In words, the `L2 norm`

is defined as, 1) square all the elements in the vector together; 2) sum these squared values; and, 3) take the square root of this sum.

### A quick example

Let’s use our simple example from earlier, .

We compute the `L2 norm`

of the vector as,

And there you go!

So in summary, 1) the terminology is a bit confusing since as there are equivalent names, and 2) the symbols are overloaded. Finally, 3) we did a small example computing the `L2 norm`

of a vector by hand.

If you are hungry for a code example, I wrote a small MATLAB example (computing L2 distance) here.

Hi Jeremy

Can we know when we use it?

Hi nagdawi84, it’s not clear when to use different normalizations. In a machine learning scenario, an unsatisfying but practical answer is to try a few different normalizations, and choose the one that performs the best on your validation set.

You may also want to try normalization if you’re combining different sources of data with vastly different range of scales. For example, say you are combining a person’s age with visual features, where the person’s age can range from 0-100, and the visual features range from 0-1. If the scale of these types of data vastly differs, normalizing may help with learning (e.g., training an SVM).

Hope that helps!