WikiGalaxy

Personalize

Pandas Overview

Introduction to Pandas

Data Manipulation and Analysis:

Pandas is a powerful Python library used for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.

DataFrame and Series:

The two primary data structures in Pandas are DataFrame and Series. A DataFrame is a 2-dimensional labeled data structure, similar to a table in a database, while a Series is a 1-dimensional array-like object.

Handling Missing Data:

Pandas provides robust methods to detect, handle, and fill missing data, making it easier to clean and prepare datasets for analysis.

Data Alignment:

Pandas automatically aligns data in computations, which makes it easier to perform operations on data with different indexes.

Data Wrangling:

With Pandas, you can perform complex data wrangling tasks, including reshaping, merging, and grouping data, to transform raw data into a more usable format.

Example 1: Creating a DataFrame


import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Console Output:

Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago

Example 2: Selecting Data


import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df['Name'])  # Selecting a single column
print(df[['Name', 'Age']])  # Selecting multiple columns

Console Output:

0 Alice 1 Bob 2 Charlie Name: Name, dtype: object Name Age 0 Alice 25 1 Bob 30 2 Charlie 35

Example 3: Handling Missing Data


import pandas as pd
import numpy as np

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, np.nan, 35],
    'City': ['New York', 'Los Angeles', np.nan]
}

df = pd.DataFrame(data)
print(df.isnull())  # Check for missing values
df.fillna('Unknown', inplace=True)  # Fill missing values
print(df)

Console Output:

Name Age City 0 False False False 1 False True False 2 False False True Name Age City 0 Alice 25 New York 1 Bob Unknown Los Angeles 2 Charlie 35 Unknown

Example 4: DataFrame Operations


import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
df['Age'] += 1  # Increment age by 1
df['Salary'] *= 1.1  # Increase salary by 10%
print(df)

Console Output:

Name Age Salary 0 Alice 26 55000.0 1 Bob 31 66000.0 2 Charlie 36 77000.0

Example 5: Grouping Data


import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Department': ['HR', 'Finance', 'HR', 'Finance'],
    'Salary': [50000, 60000, 70000, 80000]
}

df = pd.DataFrame(data)
grouped = df.groupby('Department').mean()
print(grouped)

Console Output:

Salary Department Finance 70000.0 HR 60000.0