Python Pandas – How to groupby and aggregate a DataFrame

Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.

Create the DataFrame with some example data

import pandas as pd

# Make up some data.
data = [
    {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'},
    {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'},
]

# Create the DataFrame.
df = pd.DataFrame(data)
# View the DataFrame.
df

You should see a DataFrame that looks like this:

      unit       building  number_units      civ
0   archer  archery_range             1  spanish
1  militia       barracks             2  spanish
2  pikemen       barracks             3  spanish
3  pikemen       barracks             4     huns

Example 1: Groupby and sum specific columns

Let’s say you want to count the number of units, but separate the unit count based on the type of building.

Continue reading “Python Pandas – How to groupby and aggregate a DataFrame”

How to select a single GPU in Keras

Scenario: You have multiple GPUs on a single machine running Linux, but you want to use just one. By default, Keras allocates memory to all GPUs unless you specify otherwise. You use a Jupyter Notebook to run Keras with the Tensorflow backend.

Here’s how to use a single GPU in Keras with TensorFlow

Run this bit of code in a cell right at the start of your notebook (before importing tensorflow or keras).

Continue reading “How to select a single GPU in Keras”

HP Stream 11 review – running Ubuntu 16

tldr; Not recommended for non-technical people. Not recommended as a primary machine. But if you want a small secondary laptop for travel and light work, and if you install Ubuntu on it, this laptop is a surprising treat (for the price)!


Update May 17, 2018:
With less than 32 GB hard-drive space, you’re going to fill this up quick. I did – and even after deleting unnecessary files, I still couldn’t free much more than 200 MB.

Then I stumbled across the command:

sudo apt-get autoremove

This cleared up over 7 GB! So if you’re running out of space on your tiny hard-drive, and you really can’t delete any more stuff, give that a try.


Update Feb 17, 2017:
After using this laptop longer, here are a few more notes:
– My main complaint is slow browser performance. If you’re doing a bunch of stuff in the browser (e.g., google-docs, google-slides), you should be prepared for some lag
– When doing a Google Hangout with a bunch of users on video, the machine and meeting lags like crazy. It’s not usable for a large group of users over Google Hangouts with the video on.


If installing a new operating system terrifies you (it’s actually not that hard), buy something else. If it does not, then this is a great little machine. I find myself using this little HP Stream more than my other powerful laptop. The utility of a physically light laptop is not to be underestimated.

Note that some reviews claimed that if you remove all the bloatware of it, this machine runs Windows fine. So you might get a decent Windows experience if you remove bloatware at the start.

Now what is this HP Stream 11 you might ask. Well it’s …

A light travel laptop

This machine is light in all the sense of the words. Physically, it’s a light machine; it’s tiny. Color-wise, it’s a light bright blue or purple. Spec-wise, it’s very light.

They should have called this HP Light 11.

But light can be good. Sometimes I want a light machine, one that I don’t care if it gets lost or stolen, or dropped and broken. With a light machine, I can fit it into a travel bag, and do some rough prototyping before pushing the code to more capable machines.

Continue reading “HP Stream 11 review – running Ubuntu 16”

Mendeley crashes on Ubuntu laptop with NVIDIA GPU

On a Ubuntu laptop, with a NVIDIA GPU, when trying to open Mendeley, you get this rather unhelpful error:

The application Mendeley Desktop has closed unexpectedly.

I’m sure there are many causes for this error, but one unexpected reason you might get this error is related to your graphics card.

If you have a NVIDIA GPU on your laptop, try to switch to your Intel graphics card instead of NVIDIA..

To switch to your Intel graphics card, open your terminal and type:

sudo prime-select intel

Then restart Mendeley. Like magic and deep learning, it just seems to work.

(if you need to switch back to your NVIDIA card, just type sudo prime-select nvidia)

TensorFlow – failed call to cuInit: CUDA_ERROR_UNKNOWN

Scenario: You’re trying to get your GPU to work in TensorFlow on a Ubuntu Laptop. You’ve already installed Tensorflow, Cuda, and Nvidia drivers.

You run python and import TensorFlow:

import tensorflow as tf

And you see encouraging messages like: "successfully opened CUDA library libcublas.so locally"

But in Python, when you run,

tf.Session()

You get this cryptic error:

failed call to cuInit: CUDA_ERROR_UNKNOWN

Here’s how to fix this.
Continue reading “TensorFlow – failed call to cuInit: CUDA_ERROR_UNKNOWN”

How to normalize vectors to unit norm in Python

There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).

One way to normalize the vector is to apply some normalization to scale the vector to have a length of 1 i.e., a unit norm. There are different ways to define “length” such as as l1 or l2-normalization. If you use l2-normalization, “unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1.

(note this normalization is also often referred to as, unit norm or a vector of length 1 or a unit vector).

Continue reading “How to normalize vectors to unit norm in Python”

How to upload sitemap.xml to google app engine

Ah the world wide web… the old www. So many factors to consider when developing web applications, especially if you’re used to developing in a relatively simple research environment.

One thing you will think about is how to get Google to include your page in it’s searchers (i.e., get indexed). To do this, we’ll submit a sitemap to Google. Basically a sitemap lists the links that you want Google to start indexing (so other people can find it when they search the web).

If you’re using WordPress, this is pretty simple, and you can just download a plugin and skip to the last step below (step 6).

However, if you’re using Google App Engine to run your site, then you need to do a few more steps. So here’s instructions for how to do this.
Continue reading “How to upload sitemap.xml to google app engine”

ubuntu – black screen on ubuntu laptop after installing nvidia drivers

I’m running Ubuntu on my Lenovo Y50 laptop, with a Nvidia GPU. And every time I do an update (or restart it?), I see the Ubuntu logo, hear the chime to log in, and then see a blank black screen, or a small white dot in the upper corner.

Other times, after a reboot, I get to the login screen, enter my username and password, then everything flickers violently, and it loops back to asks me to enter in my info again.

Today this post is not about how to permanently fix this (although that would be nice), but rather how to get your GUI back (until you update/restart your machine again).

It seems that on some laptops, the Nvidia drivers and Ubuntu do not always nicely play together. Why? I am not sure.

But anyways, here’s how to get fix your laptop when Ubuntu has a black screen on login (assuming your problem is related to the Nvidia drivers).
Continue reading “ubuntu – black screen on ubuntu laptop after installing nvidia drivers”

Using numpy on google app engine with the anaconda python distribution

Scenario
– you are using the Google App Engine (GAE) development server with Python
– you installed the Anaconda Python distribution
– you want to use the Numpy library with GAE

On Ubuntu and on Mac (but not Windows for some reason), you get this error when trying to deploy:
google app engine ImportError: No module named _ctypes

The tldr; solution
Create an Anaconda environment using numpy 1.6 and python 2.7:

conda create -n np16py27 anaconda numpy=1.6 python=2.7

Load this specific environment from the command line:

source activate np16py27

Run your GAE dev server:

dev_appserver.py my_gae_project

That’s it! You can read more details below if you are interested.
Continue reading “Using numpy on google app engine with the anaconda python distribution”

how to compute true/false positives and true/false negatives in python for binary classification problems

Here’s how to compute true positives, false positives, true negatives, and false negatives in Python using the Numpy library.

Note that we are assuming a binary classification problem here. That is a value of 1 indicates a positive class, and a value of 0 indicates a negative class. For multi-class problems, this doesn’t really hold.

Continue reading “how to compute true/false positives and true/false negatives in python for binary classification problems”