WikiGalaxy

Personalize

Using Pandas groupby for Aggregation

Introduction:

The groupby() function in Pandas is a powerful tool for performing complex data manipulations and aggregations. It allows you to split data into groups based on some criteria, apply a function to each group independently, and then combine the results back together.

Example 1: Grouping by a Single Column

Description:

In this example, we will group a DataFrame by a single column and calculate the sum of another column for each group.


import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'],
        'Values': [10, 20, 30, 40]}

df = pd.DataFrame(data)
grouped = df.groupby('Category').sum()
print(grouped)
        

Explanation:

The DataFrame is grouped by the 'Category' column. The sum() function is applied to aggregate the 'Values' column for each category.

Console Output:

Category A 40 B 60

Example 2: Grouping by Multiple Columns

Description:

This example demonstrates how to group a DataFrame by multiple columns and calculate the mean of another column for each group.


import pandas as pd

data = {'Category': ['A', 'A', 'B', 'B'],
        'Subcategory': ['X', 'Y', 'X', 'Y'],
        'Values': [10, 20, 30, 40]}

df = pd.DataFrame(data)
grouped = df.groupby(['Category', 'Subcategory']).mean()
print(grouped)
        

Explanation:

The DataFrame is grouped by both 'Category' and 'Subcategory'. The mean() function is used to calculate the average 'Values' for each group.

Console Output:

Category Subcategory A X 10.0 Y 20.0 B X 30.0 Y 40.0

Example 3: Applying Multiple Aggregations

Description:

In this example, we will apply multiple aggregation functions to a grouped DataFrame.


import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'],
        'Values': [10, 20, 30, 40]}

df = pd.DataFrame(data)
grouped = df.groupby('Category').agg(['sum', 'mean'])
print(grouped)
        

Explanation:

The DataFrame is grouped by the 'Category' column. Both sum() and mean() functions are applied to the 'Values' column for each group.

Console Output:

Values sum mean Category A 40 20.0 B 60 30.0

Example 4: Filtering Groups

Description:

This example shows how to filter groups based on a condition after grouping.


import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'],
        'Values': [10, 20, 30, 40]}

df = pd.DataFrame(data)
grouped = df.groupby('Category').filter(lambda x: x['Values'].sum() > 30)
print(grouped)
        

Explanation:

The DataFrame is grouped by 'Category', and only those groups where the sum of 'Values' exceeds 30 are retained.

Console Output:

Category Values 1 B 20 3 B 40

Example 5: Transforming Groups

Description:

In this example, we will transform each group by applying a function to each element in the group.


import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'],
        'Values': [10, 20, 30, 40]}

df = pd.DataFrame(data)
df['Transformed'] = df.groupby('Category')['Values'].transform(lambda x: x / x.sum())
print(df)
        

Explanation:

The DataFrame is grouped by 'Category', and the 'Values' column is transformed by dividing each value by the sum of its group.

Console Output:

Category Values Transformed 0 A 10 0.250000 1 B 20 0.333333 2 A 30 0.750000 3 B 40 0.666667

logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025