The Basics of Python: Pandas

An Introduction to Python's Pandas Library for Data Manipulation and Analysis

Feb 23, 2022

Pandas is a Python library that is widely used for data manipulation and analysis. It provides powerful data structures and functions for working with structured data.

Some of the most important functions in Pandas include:

pd.DataFrame(): creates a Pandas DataFrame from a Python dictionary or array
df.head(): returns the first few rows of a DataFrame
df.tail(): returns the last few rows of a DataFrame
df.info(): returns information about the data types and null values in a DataFrame
df.describe(): returns descriptive statistics about the data in a DataFrame
df.groupby(): groups data in a DataFrame by one or more columns
df.merge(): merges two DataFrames based on a common column
df.sort_values(): sorts a DataFrame by one or more columns
df.drop(): drops rows or columns from a DataFrame
df.fillna(): fills null values in a DataFrame with a specified value or method

Here's an example of how to create a Pandas DataFrame and use some of these functions:

import pandas as pd

# create a DataFrame from a Python dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']}
df = pd.DataFrame(data)

# print the first few rows of the DataFrame
print(df.head())

# print information about the data types and null values in the DataFrame
print(df.info())

# compute descriptive statistics about the data in the DataFrame
print(df.describe())

# group the data by the City column and compute the mean of the Age column
print(df.groupby('City').mean())

# create a second DataFrame with additional data
data2 = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
         'Salary': [50000, 60000, 70000, 80000, 90000]}
df2 = pd.DataFrame(data2)

# merge the two DataFrames based on the Name column
merged_df = df.merge(df2, on='Name')

# sort the merged DataFrame by the Age column
sorted_df = merged_df.sort_values(by='Age')

# drop the City column from the sorted DataFrame
final_df = sorted_df.drop(columns='City')

# fill null values in the Salary column with the mean of the column
final_df['Salary'] = final_df['Salary'].fillna(final_df['Salary'].mean())

# print the final DataFrame
print(final_df)

Go Far AI

The Basics of Python: Pandas

An Introduction to Python's Pandas Library for Data Manipulation and Analysis