| by Arround The Web | No comments

R – Extract Columns From Data Frame

One day, Person X asked Person Y, “How do you get the values present in the data frame column in R language?” So, Person Y answered, “There are many ways to extract columns from the data frame.” So, he requested Person X to check this tutorial.

There are many ways to extract columns from the data frame. In this article, we will discuss two scenarios with their corresponding methods.

Now, we will see how to extract columns from a data frame. First, let’s create a data frame.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#display the market dataframe

print(market)

Result:

You can see the market data frame here:

Let’s discuss them one by one.

Scenario 1: Extract Columns From the Data Frame by Column Name

In this scenario, we will see different methods to extract column/s from a data frame using column names. It returns the values present in the column in the form of a vector.

Method 1: $ Operator

The $ operator will be used to access the data present in a data frame column.

Syntax:

dataframe_object$column

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column to be retrieved.

Example

In this example, we will extract market_name and market_type columns separately.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract market_name column

print(market$market_name)

#extract market_type column

print(market$market_type)

Result:

We can see that the values present in market_name and market_type were returned.

Method 2: Specifying Column Names in a Vector

Here, we are specifying column names to be extracted inside a vector.

Syntax:

dataframe_object[,c(column,....)]

Where,

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns - "market_id","market_squarefeet" and "market_place"

print(market[ , c("market_id", "market_squarefeet","market_place")])

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: subset() With select()

In this case, we are using subset() with a select parameter to extract column names from the data frame. It takes two parameters. The first parameter is the data frame object, and the second parameter is the select() method. The column names through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(column,....))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved via the select() method.

Example

In this example, we will extract “market_id”,”market_squarefeet” and “market_place” columns at a time using subset() with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns -"market_id","market_squarefeet" and "market_place"

print(subset(market,select= c("market_id", "market_squarefeet","market_place")) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 4: select()

The select() method takes column names to be extracted from the data frame and loaded into the dataframe object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(column,....))

Parameters:

  1. The dataframe_object is the data frame.
  2. The column is the name of the column/s to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library("dplyr")

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns - "market_id","market_squarefeet", and "market_place"

print(market %>% select("market_id", "market_squarefeet","market_place"))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Scenario 2: Extract Columns From Data Frame by Column Indices

In this scenario, we will see different methods to extract column/s from a data frame using column index. It returns the values present in the column in the form of a vector. Index starts with 1.

Method 1: Specifying Column Indices in a Vector

Here, we are specifying column indices to be extracted inside a vector.

Syntax:

dataframe_object[,c(index,....)]

Where,

        1. The dataframe_object is the data frame.
        2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns - "market_id","market_squarefeet" and "market_place" using column indices

print(market[ , c(1,5,3)])

Result:

We can see that the columns – “market_id”,”market_squarefeet” and “market_place” were returned.

Method 2: subset() With select()

In this case, we are using subset() with select parameters to extract columns from the data frame with column indices. It takes two parameters. The first parameter is the dataframe object and the second parameter is the select() method. The column indices through a vector are assigned to this method.

Syntax:

subset(dataframe_object,select=c(index,....))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”, “market_squarefeet”, and “market_place” columns at a time using the subset() method with select parameter.

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns - #extract columns - "market_id","market_squarefeet" and "market_place" using column indices

print(subset(market,select= c(1,5,3)) )

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Method 3: select()

The select() method takes the column indices to be extracted from the data frame and loaded into the data frame object using the “%>%” operator. The select() method is available in the dplyr library. Therefore, we need to use this library.

Syntax:

dataframe_object %>% select(index,....))

Parameters:

  1. The dataframe_object is the data frame.
  2. The index represents the column/s position to be retrieved.

Example

In this example, we will extract “market_id”,”market_squarefeet”, and “market_place” columns at a time using the select() method.

library("dplyr")

#create a dataframe-market that has 4 rows and 5 columns.

market=data.frame(market_id=c(1,2,3,4),market_name=c('M1','M2','M3','M4'),
market_place=c('India','USA','India','Australia'),market_type=c('grocery','bar','grocery',
'restaurent'),market_squarefeet=c(120,342,220,110))

#extract columns - #extract columns - "market_id","market_squarefeet" and "market_place" using column indices

print(market %>% select(1,5,3))

Result:

We can see that the columns: “market_id”, “market_squarefeet”, and “market_place” were returned.

Conclusion

This article discussed how we could extract the columns through column names and column indices using the select() and subset() methods with select parameters. And if we want to extract a single column, simply use the “$” operator.

Share Button

Source: linuxhint.com

Leave a Reply