Recap the Fundamentals of Data Analysis with Pandas: A Simple Code-Driven Approach
Let's review the basic of Pandas with these basic functions.
As a data scientist, it can be easy to get lost in the overwhelming number of advanced Python libraries available. However, mastering the basics is essential for building a strong foundation in data science. In this blog post, we will explore the powerful yet elegant Pandas library and provide code examples for some of its fundamental functions. Whether you are a beginner or an experienced practitioner, these examples will help you reinforce your understanding of Pandas and improve your data analysis skills.
df_head:
name age gender income married
0 Sara 25 female 50000.0 False
1 Mohammed 30 male 60000.0 True
df_tail:
name age gender income married
2 Ali 35 male NaN True
3 Khaled 40 male 80000.0 False
4 Asmaa 45 female 90000.0 True
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 5 non-null object
1 age 5 non-null int64
2 gender 5 non-null object
3 income 4 non-null float64
4 married 5 non-null bool
dtypes: bool(1), float64(1), int64(1), object(2)
memory usage: 293.0+ bytes
df_info:
None
df_desc:
age income
count 5.000000 4.000000
mean 35.000000 70000.000000
std 7.905694 18257.418584
min 25.000000 50000.000000
25% 30.000000 57500.000000
50% 35.000000 70000.000000
75% 40.000000 82500.000000
max 45.000000 90000.00000
df_sort:
name age gender income married
0 Sara 25 female 50000.0 False
1 Mohammed 30 male 60000.0 True
2 Ali 35 male NaN True
3 Khaled 40 male 80000.0 False
4 Asmaa 45 female 90000.0 True
df_group:
gender
female 70000.0
male 70000.0
Name: income, dtype: float64
df_pivot:
married False True
gender
female 50000.0 90000.0
male 80000.0 60000.0
df_apply:
age income
0 50 100000.0
1 60 120000.0
2 70 NaN
3 80 160000.0
4 90 180000.0
df_fillna:
name age gender income married
0 Sara 25 female 50000.0 False
1 Mohammed 30 male 60000.0 True
2 Ali 35 male 0.0 True
3 Khaled 40 male 80000.0 False
4 Asmaa 45 female 90000.0 True
df_isnull:
name age gender income married
0 False False False False False
1 False False False False False
2 False False False True False
3 False False False False False
4 False False False False False
df_drop_duplicates:
name age gender income married
0 Sara 25 female 50000.0 False
1 Mohammed 30 male 60000.0 True
2 Ali 35 male NaN True
3 Khaled 40 male 80000.0 False
4 Asmaa 45 female 90000.0 True
While there are many advanced functions available in Pandas, it's important to master the basics in order to build a strong foundation in data science.