The Basics of Python: Pandas
An Introduction to Python's Pandas Library for Data Manipulation and Analysis
Pandas is a Python library that is widely used for data manipulation and analysis. It provides powerful data structures and functions for working with structured data.
Some of the most important functions in Pandas include:
pd.DataFrame()
: creates a Pandas DataFrame from a Python dictionary or arraydf.head()
: returns the first few rows of a DataFramedf.tail()
: returns the last few rows of a DataFramedf.info()
: returns information about the data types and null values in a DataFramedf.describe()
: returns descriptive statistics about the data in a DataFramedf.groupby()
: groups data in a DataFrame by one or more columnsdf.merge()
: merges two DataFrames based on a common columndf.sort_values()
: sorts a DataFrame by one or more columnsdf.drop()
: drops rows or columns from a DataFramedf.fillna()
: fills null values in a DataFrame with a specified value or method
Here's an example of how to create a Pandas DataFrame and use some of these functions:
import pandas as pd
# create a DataFrame from a Python dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']}
df = pd.DataFrame(data)
# print the first few rows of the DataFrame
print(df.head())
# print information about the data types and null values in the DataFrame
print(df.info())
# compute descriptive statistics about the data in the DataFrame
print(df.describe())
# group the data by the City column and compute the mean of the Age column
print(df.groupby('City').mean())
# create a second DataFrame with additional data
data2 = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Salary': [50000, 60000, 70000, 80000, 90000]}
df2 = pd.DataFrame(data2)
# merge the two DataFrames based on the Name column
merged_df = df.merge(df2, on='Name')
# sort the merged DataFrame by the Age column
sorted_df = merged_df.sort_values(by='Age')
# drop the City column from the sorted DataFrame
final_df = sorted_df.drop(columns='City')
# fill null values in the Salary column with the mean of the column
final_df['Salary'] = final_df['Salary'].fillna(final_df['Salary'].mean())
# print the final DataFrame
print(final_df)