WikiGalaxy

Personalize

Concatenating DataFrames in Pandas

Pandas is a powerful data manipulation library in Python, and one of its key features is the ability to concatenate DataFrames. Concatenating DataFrames allows you to combine data from multiple sources into a single DataFrame, which is essential for data analysis and preprocessing. The concat() function in pandas provides flexibility to concatenate along different axes (rows or columns) and handle overlapping data.

Example 1: Concatenating Along Rows

In this example, we concatenate two DataFrames along the rows. This is useful when you have data with the same columns and want to stack them vertically.


import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenating DataFrames along rows
result = pd.concat([df1, df2], axis=0)
print(result)

The resulting DataFrame stacks the rows of df2 below df1, maintaining the column structure.

Console Output:

A B 0 1 3 1 2 4 0 5 7 1 6 8

Example 2: Concatenating Along Columns

Here, we concatenate two DataFrames along the columns. This is useful when you want to combine different attributes of the same set of entities.


import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})

# Concatenating DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)

The resulting DataFrame has columns from both df1 and df2, aligned by their index.

Console Output:

A B 0 1 3 1 2 4

Example 3: Concatenating with Different Indices

When concatenating DataFrames with different indices, pandas aligns them by index. Missing values are filled with NaN.


import pandas as pd

# Creating two DataFrames with different indices
df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
df2 = pd.DataFrame({'B': [3, 4]}, index=[2, 3])

# Concatenating DataFrames
result = pd.concat([df1, df2], axis=1)
print(result)

The resulting DataFrame shows NaN for missing values where the indices do not match.

Console Output:

A B 0 1.0 NaN 1 2.0 NaN 2 NaN 3.0 3 NaN 4.0

Example 4: Concatenating with Keys

Using keys while concatenating allows you to distinguish between the source DataFrames in the resulting multi-indexed DataFrame.


import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

# Concatenating DataFrames with keys
result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)

The resulting DataFrame has a multi-level index, with the keys indicating the source DataFrame.

Console Output:

A df1 0 1 1 2 df2 0 3 1 4

Example 5: Concatenating with Ignore Index

Ignoring the index during concatenation reassigns a new index to the resulting DataFrame, which is useful when the original indices are not needed.


import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

# Concatenating DataFrames with ignore_index=True
result = pd.concat([df1, df2], ignore_index=True)
print(result)

The resulting DataFrame has a new integer index, starting from 0.

Console Output:

A 0 1 1 2 2 3 3 4