Introduction to pandas

Home
ML
Introduction to pandas

Introduction to pandas

March 5, 2024 Amrit Panta 31

Pandas is the powerful python data analysis toolkit.

=> it is the open source,fast and efficient Dataframe object for data manipulation

=> Reading and writing data structure and fifferent formate:csv,tsv,XML,json,zip e.t.c

=> Data pre-processing used pandas for missing values e.t.c

Pandas Data Structure

.Series

it is one dimensional labeled homogenous array.

series
apples
3
2
0
1

series
oranges
0
3
7
2

.Data frame

it is 2-dimensional labeled heterogenous tabular structure.

series+series=Data frame

data frame
apple	oranges
3	0
2	3
0	7
1	2

.Panel

it is 3D labeled array.

so , pandas can read and write 3 types of data structure.

note:Numpy array is used for the implementation of pandas data object.

Installation and import pandas

pip install pandas

import pandas as pd

You can check version of pandas as below:

Let's Learn Each Data Structure of pandas :

1.Series

import pandas as pd
data=[1,'one',-10,3.2,"Nepal"]
s1=pd.Series(data)

print(s1)
print(type(s1))


# for empty Series
empty_series=pd.Series([])
print(empty_series)

country_series=pd.Series(["Nepal","Australia","india","England"])
print(country_series)



output:
0        1
1      one
2      -10
3      3.2
4    Nepal
dtype: object
<class 'pandas.core.series.Series'>
Series([], dtype: object)
0        Nepal
1    Australia
2        india
3      England
dtype: object

import pandas as pd
s3=pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
# note index must be wqual to the number of elements in series
print(s3)

s4=pd.Series([1,2,3,4],index=['a','b','c','d'],dtype=float)
print(s4,"\n")

s5=pd.Series(0.5)
print(s5 ,"\n")

s6=pd.Series(0.5,index=[1,2,3])
print(s6,"\n")

s7=pd.Series({"a":1,"b":2,"c":3})
# you can also create Series using dictionary
print(s7,"\n")


output:
a    1
b    2
c    3
d    4
e    5
dtype: int64
a    1.0
b    2.0
c    3.0
d    4.0
dtype: float64 

0    0.5
dtype: float64 

1    0.5
2    0.5
3    0.5
dtype: float64

2.DataFrame

Pandas Dataframe is two-dimensional,size-mutable,potentially heterrogenous tabular data structure with labeled axes(row and colums).

import pandas as pd 
empty_df=pd.DataFrame()
print(empty_df,'\n')

lst=['a','b','c']
df1=pd.DataFrame(lst)
print(df1,'\n')


lst2=[[1,2,3],[4,5,6],[7,8,9]]
df2=pd.DataFrame(lst2)
print(df2,'\n')


output:
Empty DataFrame
Columns: []
Index: [] 

   0
0  a
1  b
2  c 

   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

Now we sre going to create Datframe from list

import pandas as pd 

dict1={"ID":[1,2,3,4,5]}
df3=pd.DataFrame(dict1)
print(df3,'\n')

dict2={"ID":[1,2,3,4,5],"SN":[6,7,8,9,10]}
#note size of list must be same
df4=pd.DataFrame(dict2)
print(df4,'\n')

# now creating with list of dictionary
li_dict=[{"a":1,"b":2},{"a":5,"b":5,"c":10}]
df5=pd.DataFrame(li_dict)
print(df5,'\n')

#note if the dictionary size is unequal then pandas will manage it by eeplacing it by NaN


#now make dataframe from dictionary of Series
dic_series={
    "ID":pd.Series([1,2,3,4,5]),
    "SN":pd.Series([6,7,8,9,10])
   
}
df6=pd.DataFrame(dic_series)
print(df6,'\n')

#output:
   ID
0   1
1   2
2   3
3   4
4   5 

   ID  SN
0   1   6
1   2   7
2   3   8
3   4   9
4   5  10 

   a  b     c
0  1  2   NaN
1  5  5  10.0 

   ID  SN
0   1   6
1   2   7
2   3   8
3   4   9
4   5  10

We can create DataFrame zip(),list of tuple e.t.c

CSV File:

CSV: it is the extension of the file . it's full form is "comma separated values". csv formate is based to store data in tabular formate.

Advantages of CSV file:

1.universal

2.easy to understand

3.quick to create

How to Read csv file in pandas

import pandas as pd 
import os
# print(help(pd.read_csv))
#you can learn more from above statement 
cwd=os.getcwd()
# df=pd.read_csv('test.csv')
df=pd.read_csv(f'{cwd}/test.csv')
print(type(df))
print(df.columns)


#output:
<class 'pandas.core.frame.DataFrame'>
Index(['id', 'name', 'address'], dtype='object')

df=pd.read_csv('locations',nrows=1)
# to read 1st row data

df1=pd.read_csv('locations',usecols=[0])
# to read 0 index column data

df1=pd.read_csv('locations',usecols=[0,1])
# to read 0  and 1 index column data
df1=pd.read_csv('locations',usecols=[1,3,5])
# to read 1,3,5 index column data


#if you want to skip the rows then 
df1=pd.read_csv('locations',skiprows=1)
#it will skip every one row while reading data from file


'''
if you want to skip any specific row's then, you need to write index of row
 inside the list'''
df1=pd.read_csv('locations',skiprows=[0,5])


'''
if you want to make any column as the first row, by default 0,1,2 , if youu want 
to remove this used index_col=''
'''
df1=pd.read_csv('locations',index_col='ID')
#or you want directly give index value as index_col=2

#header,prefix and names


df1=pd.read_csv('test.csv',header=1)
#any index number you wan to give header , you can used header=None as well

df2=pd.read_csv('test.csv')
df2.columns = ['Columns' + str(col) for col in df2.columns]


'''
if you want specific name for each columns then used names

'''
df3=pd.read_csv('test.csv',header=0,names=['sn','name','address'])
print(df3)

How to Write csv file in pandas

pandas write csv file is mainly used for data processing - to clean raw data and find some useful instance.

About author

Amrit Panta

Python developer, content writer

Introduction to reinforcement learning

ML June 11, 2023

Introduction To supervised Learning

ML June 11, 2023

3 Comments

Amanda Martines 5 days ago

Exercitation photo booth stumptown tote bag Banksy, elit small batch freegan sed. Craft beer elit seitan exercitation, photo booth et 8-bit kale chips proident chillwave deep v laborum. Aliquip veniam delectus, Marfa eiusmod Pinterest in do umami readymade swag. Selfies iPhone Kickstarter, drinking vinegar jean.

Baltej Singh 5 days ago

Drinking vinegar stumptown yr pop-up artisan sunt. Deep v cliche lomo biodiesel Neutra selfies. Shorts fixie consequat flexitarian four loko tempor duis single-origin coffee. Banksy, elit small.

Marie Johnson 5 days ago

Kickstarter seitan retro. Drinking vinegar stumptown yr pop-up artisan sunt. Deep v cliche lomo biodiesel Neutra selfies. Shorts fixie consequat flexitarian four loko tempor duis single-origin coffee. Banksy, elit small.