How to normalize vectors to unit norm in Python

There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).

One way to normalize the vector is to apply l2-normalization to scale the vector to have a unit norm. “Unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1.

(note this normalization is also often referred to as, unit norm or a vector of length 1 or a unit vector)

So given a matrix X, where the rows represent samples and the columns represent features of the sample, you can apply l2-normalization to normalize each row to a unit norm. This can be done easily in Python using sklearn.

Here’s how to l2-normalize vectors to a unit vector in Python

import numpy as np
from sklearn import preprocessing
 
# Two samples, with 3 dimensions.
# The 2 rows indicate 2 samples, 
# and the 3 columns indicate 3 features for each sample.
X = np.asarray([[-1,0,1],
                [0,1,2]], dtype=np.float) # Float is needed.
 
# Before-normalization.
print X
# Output,
# [[-1.  0.  1.]
#  [ 0.  1.  2.]]
 
# l2-normalize the samples (rows). 
X_normalized = preprocessing.normalize(X, norm='l2')
 
# After normalization.
print X_normalized
# Output,
# [[-0.70710678  0.          0.70710678]
#  [ 0.          0.4472136   0.89442719]]

Now what did this do?
Continue reading “How to normalize vectors to unit norm in Python”

state-of-the-art classification methods on computer vision datasets

When choosing among the many academic papers to read, I find a nice heuristic is to pick a paper that performs well on publicly available standardized datasets. So I was happy to come across this webpage that tracks what the state-of-the-art classification methods are for some well known computer vision datsets.

Check it out here:
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

The best machine learning algorithm for classification and regression

[Please note that I’m still very much a novice in this field – and that I change my mind about things often]

I had a hard time naming this post. Here are a few other titles I could have used:

Why Random/Decision Forests are my favorite machine learning algorithm

Why I think Random/Decision Forests are the best machine learning algorithm.

I know there’s exceptions to this – there exists scenarios where this title is not true. But rather than giving the vague unhelpful answer of “it depends“, here’s why I think that Random Forests should be your first and default choice when choosing a machine learning algorithm to use for classification and/or regression.

Here’s a working list (in no particular order) of why I really like working with Random/Decision forests:
Continue reading “The best machine learning algorithm for classification and regression”

Predicting Disability in Patients with Multiple Sclerosis using MRI – MICCAI CSI

Our work on predicting the physical disability level of patients was accepted for an oral presentation at the MICCAI 2013 workshop on, Computational Methods and Clinical Applications for Spine Image, in Nagoya, Japan.

The paper itself you can read here and is called:
Novel morphological and appearance features for predicting physical disability from MR images in multiple sclerosis patients

MATLAB – TreeBagger example

Did you know that Decision Forests (or Random Forests, I think they are pretty much the same thing) are implemented in MATLAB? In MATLAB, Decision Forests go under the rather deceiving name of TreeBagger.

Here’s a quick tutorial on how to do classification with the TreeBagger class in MATLAB.

Continue reading “MATLAB – TreeBagger example”