Python Pandas – How to groupby and aggregate a DataFrame

Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.

Create the DataFrame with some example data

import pandas as pd

# Make up some data.
data = [
    {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'},
    {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'},

# Create the DataFrame.
df = pd.DataFrame(data)
# View the DataFrame.

You should see a DataFrame that looks like this:

      unit       building  number_units      civ
0   archer  archery_range             1  spanish
1  militia       barracks             2  spanish
2  pikemen       barracks             3  spanish
3  pikemen       barracks             4     huns

Example 1: Groupby and sum specific columns

Let’s say you want to count the number of units, but separate the unit count based on the type of building.

Continue reading “Python Pandas – How to groupby and aggregate a DataFrame”