Pandas Data Operations
import pandas as pd
import numpy as np
What is a Dataframe?
A dataframe is a data type provided by the library pandas
It is the most relevant data type to work with tables and data in python
Imagine dataframe as a table created by rows and colummns where each row and column
is an object type pandas.Series (vector/list). Each element contains a label.
Create a DataFrame
Adding data manually
Lists of lists
Nested dictionaries
Reading the information from .csv file
Using the function pd.read_csv() with the path of the file.
#create 2D array with data
data_lst = [
['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']
]
data_lst
[['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']]
#print first column
col0 = []
, for row in data_lst:
col0.append(row[0])
col0
['A3', 'B1', 'B3', 'B3', 'A1', 'A3', 'C2']
#create test dataframe
test_df = pd.DataFrame(
data_lst
)
test_df
0 1 2 3 4
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no
3 B3 5 1.0 0.0 si
4 A1 4 0.0 NaN None
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no
#update index of rows and columns
test_df = pd.DataFrame(
data_lst,
columns=['A', 'B', 'C', 'D', 'E'],
index=[f'row{i}' for i in range(1, 8)]
)
test_df
A B C D E
row1 A3 0 -1.0 0.0 si
row2 B1 1 NaN 0.0 no
row3 B3 4 NaN 0.0 no
row4 B3 5 1.0 0.0 si
row5 A1 4 0.0 NaN None
row6 A3 1 2.0 1.0 si
row7 C2 4 1.0 1.0 no
DataFrame structure