Python Pandas – How to groupby and aggregate a DataFrame

Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.

Create the DataFrame with some example data

import pandas as pd

# Make up some data.
data = [
    {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'},
    {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'},

# Create the DataFrame.
df = pd.DataFrame(data)
# View the DataFrame.

You should see a DataFrame that looks like this:

      unit       building  number_units      civ
0   archer  archery_range             1  spanish
1  militia       barracks             2  spanish
2  pikemen       barracks             3  spanish
3  pikemen       barracks             4     huns

Example 1: Groupby and sum specific columns

Let’s say you want to count the number of units, but separate the unit count based on the type of building.

Continue reading “Python Pandas – How to groupby and aggregate a DataFrame”

How to normalize vectors to unit norm in Python

There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).

One way to normalize the vector is to apply some normalization to scale the vector to have a length of 1 i.e., a unit norm. There are different ways to define “length” such as as l1 or l2-normalization. If you use l2-normalization, “unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1.

(note this normalization is also often referred to as, unit norm or a vector of length 1 or a unit vector).

Continue reading “How to normalize vectors to unit norm in Python”

Using numpy on google app engine with the anaconda python distribution

– you are using the Google App Engine (GAE) development server with Python
– you installed the Anaconda Python distribution
– you want to use the Numpy library with GAE

On Ubuntu and on Mac (but not Windows for some reason), you get this error when trying to deploy:
google app engine ImportError: No module named _ctypes

The tldr; solution
Create an Anaconda environment using numpy 1.6 and python 2.7:

conda create -n np16py27 anaconda numpy=1.6 python=2.7

Load this specific environment from the command line:

source activate np16py27

Run your GAE dev server: my_gae_project

That’s it! You can read more details below if you are interested.
Continue reading “Using numpy on google app engine with the anaconda python distribution”

how to compute true/false positives and true/false negatives in python for binary classification problems

Here’s how to compute true positives, false positives, true negatives, and false negatives in Python using the Numpy library.

Note that we are assuming a binary classification problem here. That is a value of 1 indicates a positive class, and a value of 0 indicates a negative class. For multi-class problems, this doesn’t really hold.

Continue reading “how to compute true/false positives and true/false negatives in python for binary classification problems”

How to debug a Jupyter/iPython notebook

Here’s how to debug your code when using a Jupyter/iPython notebook.

Use Tracer()(). Here’s an example using a simple function (based on this lucid explanation).

def test_debug(y):
    x = 10
    # One-liner to start the debugger here.
    from IPython.core.debugger import Tracer; Tracer()() 
    x = x + y 	 
    for i in range(10):
        x = x+i
    return x

When the debugger reaches the Tracer()() line, a small line to type in commands will appear under your cell.

Simply type in the variable names to check the values or run other commands. Below I’ve listed some practical Python PBD commands. More can be found here.
Continue reading “How to debug a Jupyter/iPython notebook”

How to run an IPython/Jupyter Notebook on a remote machine

Here’s how to run an IPython/Jupyter Notebook on a remote Linux machine without using VNC. I expanded on these instructions.

Let’s assume you have two machines:
local-machine that you are physically working on
remote-machine that you want to run code on.

And you want to work in the browser on your local-machine, but execute the code on the remote-machine.

Here are the important lines you’re probably looking for:

jupyter notebook --no-browser --port=8898
ssh -N -f -L jer@remote-machine

If you want complete and detailed steps, keep reading below!

Continue reading “How to run an IPython/Jupyter Notebook on a remote machine”

CAFFE – how to specify which GPU to use in PyCaffe

You are using PyCaffe (Python interface for Caffe) and training a deep neural network directly within Python (although I think the same command holds for MATLAB).

You are on a machine with 2 GPUs and you want to specify which GPU to use for training. This is useful so you can train two different models at the same time on each GPU. Note that here we refer to training two different models on two different GPUs on the same machine, not a single model on two GPUs.

(side note: it seems to me that running two different jobs on the same GPU drastically slows GPU training. It’s so much slower that I only train a single model on a single GPU at a time. Running two different jobs on two different GPUs seems to be okay though)
Continue reading “CAFFE – how to specify which GPU to use in PyCaffe”

How to display an IPython notebook in a WordPress blog

So I wanted to to add an IPython notebook within a WordPress blog post. The first thing I tried was exporting directly to HTML, and copying the HTML directly within the WordPress post. This sort of worked. However, it was very slow to copy all the HTML into the post and the formatting looked terrible.

Then I came across a way to do so, which you can read about here.

The basic idea from above is to export the IPython notebook to HTML, upload it somewhere on your site, and then use an “iframe” to embed the IPython notebook HTML within the WordPress blog post. The author of the above then wrote some javascript to handle sizing issues.

However, I did not want people to be able to directly access the IPython notebook HTML page (if say google indexed it). Rather I want to direct people to the actual WordPress post.

So in the end, I simply add some javascript to the IPython notebook to redirect to the WordPress blog post.

Here are the steps to add an IPython notebook to a WordPress blog post:
Continue reading “How to display an IPython notebook in a WordPress blog”

theano – how to get the gpu to work

I have been working with Theano and it has been a bit of a journey getting the GPU to work. Here are a few notes to remind myself how to do so…

Start Python and check if Theano recognizes the GPU

$ python
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Aug 21 2014, 18:22:21)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2

>>> import theano
Using gpu device 0: GeForce GTX 760 Ti OEM

You should see something like the above line showing that Theano finds your GPU.

If you do not see something like the above, then Theano probably is probably not configured to work with your GPU. But let’s check some more just to be sure.
Continue reading “theano – how to get the gpu to work”

Anaconda IPython Notebook – error: [Errno 99] Cannot assign requested address

So you want to run the IPython Notebook… and you’re using Anaconda 2.1.0 on some version of Linux.

You are already able to run ipython successfully…

$ ipython
IPython 2.2.0 -- An enhanced Interactive Python
[1]: exit()

But from the command line, when you try to run the IPython Notebook:

$ ipython notebook

You get a bunch of errors… something about sockets. The last error sticks out…

    return getattr(self._sock,name)(*args
error: [Errno 99] Cannot assign requested address

A bit of googling shows you some relevant links:

We summarize the two steps needed to get the Anaconda IPython Notebook working here.
Continue reading “Anaconda IPython Notebook – error: [Errno 99] Cannot assign requested address”