WikiGalaxy

Personalize

Creating and Manipulating Pandas DataFrames

Example 1: Creating a DataFrame from a Dictionary

Pandas allows you to create a DataFrame using a dictionary where keys are column names and values are lists of column entries.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
print(df)

This example demonstrates how to create a simple DataFrame using a dictionary. Each key in the dictionary represents a column name, and each list contains the values for that column.

Example 2: Loading Data from a CSV File

Pandas provides a convenient function to load data from CSV files into a DataFrame.


import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())

This example shows how to read data from a CSV file into a DataFrame. The head() function is used to display the first few rows of the DataFrame.

Example 3: Selecting Specific Columns

You can select specific columns from a DataFrame using the column names.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
print(df[['Name', 'City']])

This example demonstrates how to select specific columns from a DataFrame. Here, only the 'Name' and 'City' columns are selected.

Example 4: Filtering Rows Based on a Condition

You can filter rows in a DataFrame based on a condition using boolean indexing.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
filtered_df = df[df['Age'] > 28]
print(filtered_df)

This example shows how to filter rows based on a condition. In this case, we filter rows where the 'Age' is greater than 28.

Example 5: Adding a New Column

You can add a new column to a DataFrame by assigning a list of values to a new column name.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
df['Salary'] = [70000, 80000, 90000]
print(df)

This example demonstrates how to add a new column to a DataFrame. Here, a 'Salary' column is added with specified values.

Example 6: Dropping Columns

You can remove columns from a DataFrame using the drop() method.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
df = df.drop('City', axis=1)
print(df)

This example shows how to drop a column from a DataFrame. The 'City' column is removed using the drop() method.

Example 7: Sorting Data

You can sort a DataFrame based on the values in one or more columns using the sort_values() method.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
df_sorted = df.sort_values(by='Age')
print(df_sorted)

This example demonstrates how to sort a DataFrame by a specific column. The data is sorted by the 'Age' column in ascending order.

Example 8: Grouping Data

You can group data in a DataFrame by one or more columns using the groupby() method.


import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 30],
        'City': ['New York', 'Los Angeles', 'Chicago', 'New York']}

df = pd.DataFrame(data)
grouped = df.groupby('City').mean()
print(grouped)

This example shows how to group data by a column and calculate the mean of the other columns for each group. Here, data is grouped by 'City'.

Example 9: Merging DataFrames

You can merge two DataFrames using the merge() function, similar to SQL joins.


import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'],
                    'Age': [25, 30]})

df2 = pd.DataFrame({'Name': ['Alice', 'Bob'],
                    'City': ['New York', 'Los Angeles']})

merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)

This example demonstrates how to merge two DataFrames on a common column. Here, the DataFrames are merged on the 'Name' column.

Example 10: Pivoting a DataFrame

Pivoting a DataFrame involves reshaping it by turning unique values from one column into multiple columns.


import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York'],
        'Temperature': [30, 40, 35]}

df = pd.DataFrame(data)
pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
print(pivot_df)

This example shows how to pivot a DataFrame, turning 'City' values into columns and using 'Date' as the index.