Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python.
Create the DataFrame with some example data
import pandas as pd # Make up some data. data = [ {'unit': 'archer', 'building': 'archery_range', 'number_units': 1, 'civ': 'spanish'}, {'unit': 'militia', 'building': 'barracks', 'number_units': 2, 'civ': 'spanish'}, {'unit': 'pikemen', 'building': 'barracks', 'number_units': 3, 'civ': 'spanish'}, {'unit': 'pikemen', 'building': 'barracks', 'number_units': 4, 'civ': 'huns'}, ] # Create the DataFrame. df = pd.DataFrame(data) # View the DataFrame. df
You should see a DataFrame that looks like this:
unit building number_units civ 0 archer archery_range 1 spanish 1 militia barracks 2 spanish 2 pikemen barracks 3 spanish 3 pikemen barracks 4 huns
Example 1: Groupby and sum specific columns
Let’s say you want to count the number of units, but separate the unit count based on the type of building.
# Sum the number of units for each building type. df.groupby( ['building'], as_index=False).agg( {'number_units':sum} )
You should see this, where there is 1 unit from the archery range, and 9 units from the barracks.
building number_units 0 archery_range 1 1 barracks 9
Example 2: Groupby multiple columns
Or maybe you want to count the number of units separated by building type and civilization type.
# Sum the number of units based on the building # and civilization type. df.groupby( ['building', 'civ'], as_index=False).agg( {'number_units':sum} )
This groups the rows and the unit count based on the type of building and the type of civilization.
building civ number_units 0 archery_range spanish 1 1 barracks huns 4 2 barracks spanish 5
Example 3: Groupby, sum and aggregate into a list
Nice nice. Okay for fun, let’s do one more example. Here’s how to aggregate the values into a list. Specifically, we’ll return all the unit types as a list.
# Sum the number of units based on # the building and civilization type, # and get the unit types in a list. df.groupby( ['building', 'civ'], as_index=False).agg( {'unit':list, 'number_units': sum} )
You can see we now have a list of the units under the unit
column. Note you can apply other operations to the agg
function if needed.
building civ unit number_units 0 archery_range spanish [archer] 1 1 barracks huns [pikemen] 4 2 barracks spanish [militia, pikemen] 5
There you go! Hopefully these examples help you use the groupby
and agg
functions in a Pandas DataFrame in Python!
You can checkout the Jupyter notebook with these examples here.
Nice!
I usually want the groupby object converted to data frame so I do something like:
A bit hackish, but does the job (the last bit results in ‘area sum’, ‘area mean’ etc.
Would be interested to know if there’s a cleaner way
Ben
Nice question Ben! I just found a new way to specify a new column header right in the function:
Think that would do the trick?
Oh that’s really cool, I didn’t know you could do that, thanks!