Python Pandas – How to groupby and aggregate a DataFrame

Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.

Create the DataFrame with some example data

import pandas as pd

# Make up some data.
data = [
    {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'},
    {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'},
    {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'},
]

# Create the DataFrame.
df = pd.DataFrame(data)
# View the DataFrame.
df

You should see a DataFrame that looks like this:

      unit       building  number_units      civ
0   archer  archery_range             1  spanish
1  militia       barracks             2  spanish
2  pikemen       barracks             3  spanish
3  pikemen       barracks             4     huns

Example 1: Groupby and sum specific columns

Let’s say you want to count the number of units, but separate the unit count based on the type of building.

# Sum the number of units for each building type.
df.groupby(
    ['building'], as_index=False).agg(
    {'number_units':sum}
)

You should see this, where there is 1 unit from the archery range, and 9 units from the barracks.

        building  number_units
0  archery_range             1
1       barracks             9

Example 2: Groupby multiple columns

Or maybe you want to count the number of units separated by building type and civilization type.

# Sum the number of units based on the building
# and civilization type.
df.groupby(
    ['building', 'civ'], as_index=False).agg(
    {'number_units':sum}
)

This groups the rows and the unit count based on the type of building and the type of civilization.

        building      civ  number_units
0  archery_range  spanish             1
1       barracks     huns             4
2       barracks  spanish             5

Example 3: Groupby, sum and aggregate into a list

Nice nice. Okay for fun, let’s do one more example. Here’s how to aggregate the values into a list. Specifically, we’ll return all the unit types as a list.

# Sum the number of units based on 
# the building and civilization type,
# and get the unit types in a list.
df.groupby(
    ['building', 'civ'], as_index=False).agg(
    {'unit':list, 'number_units': sum}
)

You can see we now have a list of the units under the unit column. Note you can apply other operations to the agg function if needed.

        building      civ                unit  number_units
0  archery_range  spanish            [archer]             1
1       barracks     huns           [pikemen]             4
2       barracks  spanish  [militia, pikemen]             5

There you go! Hopefully these examples help you use the groupby and agg functions in a Pandas DataFrame in Python!

You can checkout the Jupyter notebook with these examples here.

3 thoughts on “Python Pandas – How to groupby and aggregate a DataFrame”

  1. Nice!
    I usually want the groupby object converted to data frame so I do something like:

    q = df.groupby(
            ['cellline', 'Treatment','Baf', 'channel', 'celln', 'serie']
        ).agg(
            {
                'area' : ['mean', 'std', 'count', 'sum'],
            }
            ).reset_index()
    q.columns = [' '.join(col).strip() for col in q.columns.values]
    

    A bit hackish, but does the job (the last bit results in ‘area sum’, ‘area mean’ etc.
    Would be interested to know if there’s a cleaner way

    Ben

    1. Nice question Ben! I just found a new way to specify a new column header right in the function:

      df.groupby(
          ['building', 'civ']).agg(
          unit_sum=('number_units', sum),
          # new_column_header = ('existing_column_header', 'operation')
          unit_mean=('number_units', 'mean'),
      ).reset_index()

      Think that would do the trick?

Questions/comments? If you just want to say thanks, consider sharing this article or following me on Twitter!