How to normalize vectors to unit norm in Python

There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).

One way to normalize the vector is to apply l2-normalization to scale the vector to have a unit norm. “Unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1.

(note this normalization is also often referred to as, unit norm or a vector of length 1 or a unit vector)

So given a matrix X, where the rows represent samples and the columns represent features of the sample, you can apply l2-normalization to normalize each row to a unit norm. This can be done easily in Python using sklearn.

Here’s how to l2-normalize vectors to a unit vector in Python

import numpy as np
from sklearn import preprocessing
 
# Two samples, with 3 dimensions.
# The 2 rows indicate 2 samples, 
# and the 3 columns indicate 3 features for each sample.
X = np.asarray([[-1,0,1],
                [0,1,2]], dtype=np.float) # Float is needed.
 
# Before-normalization.
print X
# Output,
# [[-1.  0.  1.]
#  [ 0.  1.  2.]]
 
# l2-normalize the samples (rows). 
X_normalized = preprocessing.normalize(X, norm='l2')
 
# After normalization.
print X_normalized
# Output,
# [[-0.70710678  0.          0.70710678]
#  [ 0.          0.4472136   0.89442719]]

Now what did this do?
Continue reading “How to normalize vectors to unit norm in Python”

Using numpy on google app engine with the anaconda python distribution

Scenario
– you are using the Google App Engine (GAE) development server with Python
– you installed the Anaconda Python distribution
– you want to use the Numpy library with GAE

On Ubuntu and on Mac (but not Windows for some reason), you get this error when trying to deploy:
google app engine ImportError: No module named _ctypes

The tldr; solution
Create an Anaconda environment using numpy 1.6 and python 2.7:

conda create -n np16py27 anaconda numpy=1.6 python=2.7

Load this specific environment from the command line:

source activate np16py27

Run your GAE dev server:

dev_appserver.py my_gae_project

That’s it! You can read more details below if you are interested.
Continue reading “Using numpy on google app engine with the anaconda python distribution”

how to compute true/false positives and true/false negatives in python for binary classification problems

Here’s how to compute true positives, false positives, true negatives, and false negatives in Python using the Numpy library.

Note that we are assuming a binary classification problem here. That is a value of 1 indicates a positive class, and a value of 0 indicates a negative class. For multi-class problems, this doesn’t really hold.

Continue reading “how to compute true/false positives and true/false negatives in python for binary classification problems”

How to debug a Jupyter/iPython notebook

Here’s how to debug your code when using a Jupyter/iPython notebook.

Use Tracer()(). Here’s an example using a simple function (based on this lucid explanation).

 
def test_debug(y):
    x = 10
    # One-liner to start the debugger here.
    from IPython.core.debugger import Tracer; Tracer()() 
    x = x + y 	 
 
    for i in range(10):
        x = x+i
 
    return x
 
test_debug(10)

When the debugger reaches the Tracer()() line, a small line to type in commands will appear under your cell.

Simply type in the variable names to check the values or run other commands. Below I’ve listed some practical Python PBD commands. More can be found here.
Continue reading “How to debug a Jupyter/iPython notebook”

How to run an IPython/Jupyter Notebook on a remote machine

Here’s how to run an IPython Notebook (now called a Jupyter Notebook) on a remote linux machine without using VNC.

These instructions are expanded on from here,
https://coderwall.com/p/ohk6cg/remote-access-to-ipython-notebooks-via-ssh
and it’s worth reading through to get more details.

Let’s assume,
you have two machines:
local_machine that you are physically working on
remote_machine that you want to run code on.

And you want to work in the browser on your local_machine, but have the code execute on the remote_machine.

Continue reading “How to run an IPython/Jupyter Notebook on a remote machine”

CAFFE – how to specify which GPU to use in PyCaffe

You are using PyCaffe (Python interface for Caffe) and training a deep neural network directly within Python (although I think the same command holds for MATLAB).

You are on a machine with 2 GPUs and you want to specify which GPU to use for training. This is useful so you can train two different models at the same time on each GPU. Note that here we refer to training two different models on two different GPUs on the same machine, not a single model on two GPUs.

(side note: it seems to me that running two different jobs on the same GPU drastically slows GPU training. It’s so much slower that I only train a single model on a single GPU at a time. Running two different jobs on two different GPUs seems to be okay though)
Continue reading “CAFFE – how to specify which GPU to use in PyCaffe”

How to display an IPython notebook in a WordPress blog

So I wanted to to add an IPython notebook within a WordPress blog post. The first thing I tried was exporting directly to HTML, and copying the HTML directly within the WordPress post. This sort of worked. However, it was very slow to copy all the HTML into the post and the formatting looked terrible.

Then I came across a way to do so, which you can read about here.

The basic idea from above is to export the IPython notebook to HTML, upload it somewhere on your site, and then use an “iframe” to embed the IPython notebook HTML within the WordPress blog post. The author of the above then wrote some javascript to handle sizing issues.

However, I did not want people to be able to directly access the IPython notebook HTML page (if say google indexed it). Rather I want to direct people to the actual WordPress post.

So in the end, I simply add some javascript to the IPython notebook to redirect to the WordPress blog post.

Here are the steps to add an IPython notebook to a WordPress blog post:
Continue reading “How to display an IPython notebook in a WordPress blog”

theano – how to get the gpu to work

I have been working with Theano and it has been a bit of a journey getting the GPU to work. Here are a few notes to remind myself how to do so…

Start Python and check if Theano recognizes the GPU

$ python
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Aug 21 2014, 18:22:21)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

>>> import theano
Using gpu device 0: GeForce GTX 760 Ti OEM

You should see something like the above line showing that Theano finds your GPU.

If you do not see something like the above, then Theano probably is probably not configured to work with your GPU. But let’s check some more just to be sure.
Continue reading “theano – how to get the gpu to work”

Anaconda IPython Notebook – error: [Errno 99] Cannot assign requested address

So you want to run the IPython Notebook… and you’re using Anaconda 2.1.0 on some version of Linux.

You are already able to run ipython successfully…

$ ipython
IPython 2.2.0 -- An enhanced Interactive Python
[1]: exit()

But from the command line, when you try to run the IPython Notebook:

$ ipython notebook

You get a bunch of errors… something about sockets. The last error sticks out…

...
    return getattr(self._sock,name)(*args
error: [Errno 99] Cannot assign requested address

A bit of googling shows you some relevant links:

We summarize the two steps needed to get the Anaconda IPython Notebook working here.
Continue reading “Anaconda IPython Notebook – error: [Errno 99] Cannot assign requested address”