Consider the dataframes left
and right
left = pd.DataFrame([['a', 1], ['b', 2]], list('XY'), list('AB'))
left
A B
X a 1
Y b 2
right = pd.DataFrame([['a', 3], ['b', 4]], list('XY'), list('AC'))
right
A C
X a 3
Y b 4
join
Think of join
as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join
will want you to add a suffix to the overlapping column name from left dataframe. Our two dataframes do have an overlapping column name A
.
left.join(right, lsuffix='_')
A_ B A C
X a 1 a 3
Y b 2 b 4
Notice the index is preserved and we have 4 columns. 2 columns from left
and 2 from right
.
If the indexes did not align
left.join(right.reset_index(), lsuffix='_', how='outer')
A_ B index A C
0 NaN NaN X a 3.0
1 NaN NaN Y b 4.0
X a 1.0 NaN NaN NaN
Y b 2.0 NaN NaN NaN
I used an outer join to better illustrate the point. If the indexes do not align, the result will be the union of the indexes.
We can tell join
to use a specific column in the left dataframe to use as the join key, but it will still use the index from the right.
left.reset_index().join(right, on='index', lsuffix='_')
index A_ B A C
0 X a 1 a 3
1 Y b 2 b 4
merge
Think of merge
as aligning on columns. By default merge
will look for overlapping columns in which to merge on. merge
gives better control over merge keys by allowing the user to specify a subset of the overlapping columns to use with parameter on
, or to separately allow the specification of which columns on the left and which columns on the right to merge by.
merge
will return a combined dataframe in which the index will be destroyed.
This simple example finds the overlapping column to be 'A'
and combines based on it.
left.merge(right)
A B C
0 a 1 3
1 b 2 4
Note the index is [0, 1]
and no longer ['X', 'Y']
You can explicitly specify that you are merging on the index with the left_index
or right_index
paramter
left.merge(right, left_index=True, right_index=True, suffixes=['_', ''])
A_ B A C
X a 1 a 3
Y b 2 b 4
And this looks exactly like the join
example above.