Python Pandas – How to groupby and aggregate a DataFrame

Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.

Create the DataFrame with some example data

import pandas as pd

# Make up some data.
data = [
    {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'},
    {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'},

# Create the DataFrame.
df = pd.DataFrame(data)
# View the DataFrame.

You should see a DataFrame that looks like this:

      unit       building  number_units      civ
0   archer  archery_range             1  spanish
1  militia       barracks             2  spanish
2  pikemen       barracks             3  spanish
3  pikemen       barracks             4     huns

Example 1: Groupby and sum specific columns

Let’s say you want to count the number of units, but separate the unit count based on the type of building.

Continue reading “Python Pandas – How to groupby and aggregate a DataFrame”

How to select a single GPU in Keras

Scenario: You have multiple GPUs on a single machine running Linux, but you want to use just one. By default, Keras allocates memory to all GPUs unless you specify otherwise. You use a Jupyter Notebook to run Keras with the Tensorflow backend.

Here’s how to use a single GPU in Keras with TensorFlow

Run this bit of code in a cell right at the start of your notebook (before importing tensorflow or keras).

Continue reading “How to select a single GPU in Keras”

How to fix “Firefox is already running, but is not responding”

After I manually reboot my Ubuntu machine, when I try running Firefox, sometimes I get the following error:

Firefox is already running, but is not responding. To open a new window, you must first close the existing Firefox process, or restart your system.

The solution is to delete a hidden mysterious file called .parentlock. This file is typically located under your home directory in the following location.

cd ~/.mozilla/firefox/

You should see a folder with some random name (yours will differ), followed by .default, for example: 06agjsjz.default

Go into that folder:

cd 06agjsjz.default

There should be a file called .parentlock (it’s hidden so type ls -a to see it). Delete it:

rm .parentlock

Start Firefox again and it should work!

Weighted Precision and Recall Equation

The “weighted” precision or recall score using sciki-learn is defined as,

\frac{1}{\sum_{l\in \color{cyan}{L}} |\color{green}{\hat{y}}_l|}
\sum_{l \in \color{cyan}{L}}
\phi(\color{magenta}{y}_l, \color{green}{\hat{y}}_l)

  • \(\color{cyan}{L}\) is the set of labels
  • \(\color{green}{\hat{y}}\) is the true label
  • \(\color{magenta}{y}\) is the predicted label
  • \(\color{green}{\hat{y}}_l\) is all the true labels that have the label \(l\)
  • \(|\color{green}{\hat{y}}_l|\) is the number of true labels that have the label \(l\)
  • \(\phi(\color{magenta}{y}_l, \color{green}{\hat{y}}_l)\) computes the precision or recall for the true and predicted labels that have the label \(l\). To compute precision, let \(\phi(A,B) = \frac{|A \cap B|}{|A|}\). To compute recall, let \(\phi(A,B) = \frac{|A \cap B|}{|B|}\).

How is Weighted Precision and Recall Calculated?

Let’s break this apart a bit more.
Continue reading “Weighted Precision and Recall Equation”

Visualizing Data using t-SNE – slides

My goal is to publish all the slides (well maybe not my first and worst ones) I’ve made over the years for our lab’s reading group. To this end, I’ve posted some old slides (from 2015) that describe in detail the t-SNE algorithm described in this paper:

Maaten, L. van der, & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research (JMLR), 9, 2579–2605. [pdf]

The authors proposed t-SNE back in 2008. But it seems to have had a sort of revival these last few years, likely due to the number of deep learning papers using it to visualize learned features.

Continue reading “Visualizing Data using t-SNE – slides”

Conditional Image Generation with PixelCNN Decoders – slides

Awhile ago I presented and attempted to explain this work to our reading group:

van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016). Conditional Image Generation with PixelCNN Decoders. In D. D. Lee, M. Sugiyama, U. V Luxburg, I. Guyon, & R. Garnett (Eds.), NIPS (pp. 4790–4798). Retrieved from

And also dived a bit into their previous work,
van den Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel Recurrent Neural Networks. Arxiv, 48. Retrieved from

While I usually post slides to the web shortly after, this time I’ve been scared to do so. There are a few critical points from this paper that I still don’t understand. And while I told myself that I would spend some time to figure this out, it is now months later, and I’ve taken no action. So as now is always the time to continue on in spite of the fear, I’ll let you, dear Internet, have these slides in all there erroneous ways.

Continue reading “Conditional Image Generation with PixelCNN Decoders – slides”

Convolutional Neural Networks for Adjacency Matrices

We had our work, BrainNetCNN, published in NeuroImage awhile ago,

Kawahara, J., Brown, C. J., Miller, S. P., Booth, B. G., Chau, V., Grunau, R. E., Zwicker, J., G., Hamarneh, G. (2017). BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage, 146(Feb), 1038–1049.

and I’ve meant to do a blog writeup about this. We recently released our code for BrainNetCNN on GitHub (based on Caffe), which implements the proposed filters designed for adjacency matrices.

We called this library Ann4Brains. In hindsight, we could have called this something more general and cumbersome like Ann4AdjacencyMatrcies, but I still like the zombie feel that Ann4Brains has.

We designed BrainNetCNN specifically with brain connectome data in mind. Thus the tag line of,

“Convolutional Neural Networks for Brain Networks”

seemed appropriate. However, after receiving some emails about using BrainNetCNN for other types of (non-connectome) data, I’ll emphasize that this approach can be applied to any sort of adjacency matrix, and not just brain connectomes.

The core contribution of this work is the filters designed for adjacency matrices themselves. So we’ll go through each of them. But first, let’s make sure we are clear on what the brain connectome (or adjacency matrix) is.

Continue reading “Convolutional Neural Networks for Adjacency Matrices”

HP Stream 11 review – running Ubuntu 16

tldr; Not recommended for non-technical people. Not recommended as a primary machine. But if you want a small secondary laptop for travel and light work, and if you install Ubuntu on it, this laptop is a surprising treat (for the price)!

Update May 17, 2018:
With less than 32 GB hard-drive space, you’re going to fill this up quick. I did – and even after deleting unnecessary files, I still couldn’t free much more than 200 MB.

Then I stumbled across the command:

sudo apt-get autoremove

This cleared up over 7 GB! So if you’re running out of space on your tiny hard-drive, and you really can’t delete any more stuff, give that a try.

Update Feb 17, 2017:
After using this laptop longer, here are a few more notes:
– My main complaint is slow browser performance. If you’re doing a bunch of stuff in the browser (e.g., google-docs, google-slides), you should be prepared for some lag
– When doing a Google Hangout with a bunch of users on video, the machine and meeting lags like crazy. It’s not usable for a large group of users over Google Hangouts with the video on.

If installing a new operating system terrifies you (it’s actually not that hard), buy something else. If it does not, then this is a great little machine. I find myself using this little HP Stream more than my other powerful laptop. The utility of a physically light laptop is not to be underestimated.

Note that some reviews claimed that if you remove all the bloatware of it, this machine runs Windows fine. So you might get a decent Windows experience if you remove bloatware at the start.

Now what is this HP Stream 11 you might ask. Well it’s …

A light travel laptop

This machine is light in all the sense of the words. Physically, it’s a light machine; it’s tiny. Color-wise, it’s a light bright blue or purple. Spec-wise, it’s very light.

They should have called this HP Light 11.

But light can be good. Sometimes I want a light machine, one that I don’t care if it gets lost or stolen, or dropped and broken. With a light machine, I can fit it into a travel bag, and do some rough prototyping before pushing the code to more capable machines.

Continue reading “HP Stream 11 review – running Ubuntu 16”

Mastering the Game of Go – slides [paper explained]

This week I presented to our weekly reading group, this work:

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., … Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.

To quickly summarize this work…

Basically, they create a policy network, which is a convolutional neural network, that predicts the next move a human player would do from a board state. They create a value network, also a convolutional neural network, that predicts the outcome (win or lose) of the game given the current board state.
Continue reading “Mastering the Game of Go – slides [paper explained]”

Dermofit 10-class – differences in ISBI and MLMI accuracy explained

I just got a great question asking why there is a discrepancy in the accuracy reported in our two works:

[ISBI paper, we report 81.8% accuracy over 10 classes]
Kawahara, J., BenTaieb, A., & Hamarneh, G. (2016). Deep features to classify skin lesions. In IEEE ISBI (pp. 1397–1400). Summary and slides here.

[MICCAI MLMI paper, we report 74.1% accuracy over 10 classes]
Kawahara, J., & Hamarneh, G. (2016). Multi-Resolution-Tract CNN with Hybrid Pretrained and Skin-Lesion Trained Layers. In MLMI. Summary and slides here.

We use the same Dermofit dataset, so it seems surprising the accuracy we report in the papers are different. So I thought I would elaborate on why here.
Continue reading “Dermofit 10-class – differences in ISBI and MLMI accuracy explained”