WikiGalaxy

Personalize

Joining DataFrames in Pandas

Introduction to Joining DataFrames:

Joining DataFrames is a fundamental operation in data manipulation using Pandas. It allows you to combine two or more DataFrames based on a common column or index. This operation is crucial when dealing with large datasets spread across multiple tables, enabling you to perform comprehensive analyses.

Types of Joins:

Pandas supports several types of joins: inner, outer, left, and right. Each type determines how the rows from the DataFrames are combined and which keys are included in the result.

Inner Join:

An inner join returns only the rows with keys present in both DataFrames. It's useful when you need to find commonalities between datasets.

Outer Join:

An outer join returns all rows from both DataFrames, filling in missing values with NaN. This join is helpful when you need a complete dataset, regardless of missing keys.

Left Join:

A left join returns all rows from the left DataFrame and the matching rows from the right DataFrame. It's useful when you want to retain all information from the left DataFrame.

Right Join:

A right join returns all rows from the right DataFrame and the matching rows from the left DataFrame. It's used when the focus is on retaining all information from the right DataFrame.

Example 1: Inner Join

Inner Join Explanation:

In this example, we perform an inner join between two DataFrames on a common column 'ID'. Only rows with matching 'ID' values in both DataFrames are returned.


import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [3, 4, 5], 'Age': [25, 30, 35]})

result = pd.merge(df1, df2, on='ID', how='inner')
print(result)
        

Result Analysis:

The resulting DataFrame contains only the row with 'ID' 3, as it is the only common 'ID' in both DataFrames.

Console Output:

ID Name Age 0 3 Charlie 25

Example 2: Outer Join

Outer Join Explanation:

This example demonstrates an outer join, which combines all rows from both DataFrames. Missing values are filled with NaN.


import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [3, 4, 5], 'Age': [25, 30, 35]})

result = pd.merge(df1, df2, on='ID', how='outer')
print(result)
        

Result Analysis:

The resulting DataFrame includes all IDs from both DataFrames, with NaN for unmatched columns.

Console Output:

ID Name Age 0 1 Alice NaN 1 2 Bob NaN 2 3 Charlie 25.0 3 4 NaN 30.0 4 5 NaN 35.0

Example 3: Left Join

Left Join Explanation:

In a left join, all rows from the left DataFrame are returned, along with matched rows from the right DataFrame.


import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [3, 4, 5], 'Age': [25, 30, 35]})

result = pd.merge(df1, df2, on='ID', how='left')
print(result)
        

Result Analysis:

The resulting DataFrame retains all rows from df1, with NaN where df2 has no matching ID.

Console Output:

ID Name Age 0 1 Alice NaN 1 2 Bob NaN 2 3 Charlie 25.0

Example 4: Right Join

Right Join Explanation:

A right join returns all rows from the right DataFrame, with matched rows from the left DataFrame.


import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [3, 4, 5], 'Age': [25, 30, 35]})

result = pd.merge(df1, df2, on='ID', how='right')
print(result)
        

Result Analysis:

The resulting DataFrame includes all rows from df2, with NaN where df1 has no matching ID.

Console Output:

ID Name Age 0 3 Charlie 25.0 1 4 NaN 30.0 2 5 NaN 35.0

Example 5: Joining on Index

Joining on Index Explanation:

Instead of joining on a column, you can join DataFrames on their indices. This is useful when the index represents a meaningful key.


import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])
df2 = pd.DataFrame({'Age': [25, 30, 35]}, index=[3, 4, 5])

result = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
print(result)
        

Result Analysis:

The resulting DataFrame shows a comprehensive view by merging on indices, with NaN for unmatched indices.

Console Output:

Name Age 1 Alice NaN 2 Bob NaN 3 Charlie 25.0 4 NaN 30.0 5 NaN 35.0

logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025