| by Arround The Web | No comments

Pandas Join vs Merge

The “Pandas” library is a tool for analyzing data in Python. It’s open source and has two methods for combining DataFrames such as “pandas.join()” and “pandas.merge()”. The “pandas.join()” method uses an index to combine DataFrames, while the “pandas.merge()” method uses both an index and a selected column.

This article will talk about:

What is the “DataFrame.join()” Method in Python?

In Python, the “df.join()” method of the “pandas” module is utilized to join columns of another DataFrame based on the index or on the key column value.

Syntax

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)

Parameters
In the above syntax:

  • The “other” parameter specifies the other DataFrame to merge with.
  • The “on” parameter indicates the column or columns to join on. If this is not specified, the intersection of the two DataFrames indexes will be used.
  • The “how” parameter indicates the join type to execute.
  • The “lsuffix” parameter indicates the suffix to add to overlapping column names in the left DataFrame.
  • The other parameters can also perform certain operations. You can review this Pandas Join Two Dataframes guide for a thorough understanding.

Return Value
The “pandas.join()” method retrieves the new dataframe containing columns from given DataFrames.

What is the “DataFrame.merge()” Method in Python?

The “pandas.merge()” method in Pandas DataFrame is used to combine two DataFrame objects using a join operation like a database. It joins DataFrame based on columns or indexes.

Syntax

DataFrame.merge(right, how='inner', left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), on=None, left_on=None, right_on=None, copy=None, indicator=False, validate=None)

Parameters
In this syntax:

  • The “right” attribute is used to show which Pandas DataFrame object should be combined.
  • The “how” parameter indicates the merge type to be applied on the DataFrame.
  • The “on” parameter indicates the names of the index or column levels to join/merge on.
  • The “left_on” and “rigt_on” parameters indicate the names of the index or column levels to join/merge on in the right or left DataFrame.

To comprehend more, please review our guide named Pandas Merge.

Return Value
The “pandas.merge()” method retrieves a new DataFrame that includes the merged/combined data. The original DataFrame objects are not modified.

Difference Between “pandas.join()” and “pandas.merge()” Methods

The “pandas.join()” method and the “pandas.merge()” method can combine two DataFrames. The “pandas.join()” method combines DataFrame based on indexes. While the “pandas.merge()” method is more flexible and lets us choose columns to join on for both DataFrames.

Let’s understand the difference between these methods via multiple examples:

Combine Two DataFrames on their Indexes Using “pandas.join()” Method
In the below code, the join() method is used to combine two DataFrames on their row indexes. It used left join to combine two DataFrames because that is the default way to join them. The “lsuffix” and “rsuffix” arguments are used to add suffixes to the overlapping columns in the two DataFrames:

import pandas as pd
df1 = pd.DataFrame({'Name':["Joseph","Anna","Henry","Tim"],'Age' :[12,25,22,13]})
print(df1, '\n')
df2 = pd.DataFrame({'Name':["Lily","Zendaya","Peter","Jon"],'Age':[20,23,12,22]})
print(df2, '\n')
print(df1.join(df2, lsuffix="_left", rsuffix="_right"))

The above code joins the two DataFrame based on their row indexes:

Combine Two DataFrames on their Indexes Using “pandas.merge()” Method
To use “pandas.merge()” for the same task, we must specify the left join explicitly by using the “how” parameter:

import pandas as pd
df1 = pd.DataFrame({'Name':["Joseph","Anna","Henry","Tim"],'Age' :[12,25,22,13]})
print(df1, '\n')
df2 = pd.DataFrame({'Name':["Lily","Zendaya","Peter","Jon"],'Age':[20,23,12,22]})
print(df2, '\n')
print(pd.merge(df1, df2, left_index=True, right_index=True, how='left'))

The execution of the above code retrieves the following output:

Combine Two DataFrames on a Column Using “pandas.merge()” Method
The “pandas.merge()” method is used to combine DataFrames using columns. It usually uses the “inner” join parameter value by default. In this example, the “pandas.merge()” joins the DataFrames on the “Name” column because it’s the only column they have in common:

import pandas
df1 = pandas.DataFrame({'Name':["Joseph","Anna","Henry","Tim"],'Age' :[12,25,22,13]})
print(df1, '\n')
df2 = pandas.DataFrame({'Name':["Joseph","Zendaya","Henry","Jon"],'Height':[5.2,6.2,7.1,4.2]})
print(df2, '\n')
df3=pandas.merge(df1,df2)
print(df3)

The “pandas.merge()” successfully added the two DataFrames based on column value:

Note: You can choose which column to join using the “on=” parameter value, such as to join on Name common column, we can use the following code:

df3=pandas.merge(df1,df2, on='Name')

Combine Two DataFrames on a Column Using “pandas.join()” Method
To do the same thing with the “pandas.join()” method, we must first use “set_index()” to specify the index column. Firstly, the “set_index()” method sets the “Name” column as both DataFrame indexes. Next, the “pandas.join()” method is used to inner join the DataFrame. This means that only rows that have matching values in the Name column are included in the result:

import pandas
df1 = pandas.DataFrame({'Name':["Joseph","Anna","Henry","Tim"],'Age' :[12,25,22,13]})
print(df1, '\n')
df2 = pandas.DataFrame({'Name':["Joseph","Zendaya","Henry","Jon"],'Height':[5.2,6.2,7.1,4.2]})
print(df2, '\n')
df3=df1.set_index('Name').join(df2.set_index('Name'), how='inner')
print(df3)

This code retrieves the following output:

Conclusion

The “pandas.join()” function is mainly used to join DataFrames on the index, while the “pandas.merge()” function joins DataFrames on both indexes and columns. The “pandas.join()” method is used when we want a simple combination of DataFrames with the same indexes. On the other hand, the “pandas.merge()” method is used when we want more control over the columns along with indexes.

Share Button

Source: linuxhint.com

Leave a Reply