Python Data Operations 4: Merge and Concat
(Using the numpy and pandas packages imported in section one.)
This fourth section contains:
Concatenation
Merge and Join
Merge on columns
Merge on index
Convert Data types
Duplicates
Find duplicates
Remove duplicates
Concat
The .concat function joins two datasets by placing them on top of each other (vertical
concatenation) or next to each other (horizontal concatenation).
The first parameter is a list of the DataFrames to be concatenated
The second parameter (axis) is a value indicating the direction: vertical (0) or
horizontal (1)
#creating test df
test_df = pd.DataFrame(
[['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']],
, columns=['A', 'B', 'C', 'D', 'E'],
index=[f'R{i}' for i in range(7)]
)
test_df
A B C D E
R0 A3 0 -1.0 0.0 si
R1 B1 1 NaN 0.0 no
R2 B3 4 NaN 0.0 no
R3 B3 5 1.0 0.0 si
R4 A1 4 0.0 NaN None
R5 A3 1 2.0 1.0 si
R6 C2 4 1.0 1.0 no
#creating second test df
test_df2 = pd.DataFrame(
[['A1', 100],
['B1', -50],
['C1', -25]],
columns=['A', 'Z'],
index=['R0', 'R7', 'R14'])
test_df2
A Z
R0 A1 100
R7 B1 -50
R14 C1 -25
# concat 3 dfs vertically
pd.concat([test_df, test_df2, test_df2])