| by Arround The Web | No comments

Remove Columns in R

“A Data Frame will frequently contain columns that aren’t relevant to your investigation. To make it easier to emphasize the remaining columns, such columns must be removed from the Data set. By supplying label names and related axes or by directly defining index or column names, the columns can be eliminated. Labels on multiple stages can be removed using a multi-index by designating the level.

In R, you may occasionally need to eliminate one or multiple specified columns from a data frame. Fortunately, there is some built-in method from the R module that makes that simple. Dropping columns from a data set is simply a method of removing unnecessary columns from the data frame. In this post, we will look at various distinct methods for removing columns by title from a Data set in R.”

How to Remove the Columns From Data Frames in the R in Ubuntu 20.04?

The Column Drop mechanism in R allows you to remove unneeded columns from a data frame. In R, you can drop a column by its name in a variety of ways. This article shows how to remove a column from a data frame through different cases in many different methods.

Example # 1: Using subset Method to Remove Columns in R in Ubuntu 20.04

Using the subset() method with the “-“ symbol, which signifies dropping variables, is one of the simplest ways to drop columns. This function in the R programming language is used to build subsets of a data frame and to remove columns from a data frame. The syntax of the subset in r is like this: subset(df, expr). Where df is the data frame, and the expr is denoted as the expression for the subset.

In the example script, we have created the data frame “data1” that contains four columns x1, x2,x3, and x4. The record is inserted inside these columns. When executing the data1 command, the data frame is printed, showing the column names with the entries they have. After this, we have created a variable data2 and called the subset method inside it. The subset method takes the data1 as an input and the select condition to drop the columns x1 and x3.

When the data2 is executed, it generates the new data frame, which has all the columns except the x1 and x3, as these columns are removed with the subset method.

Example # 2: Using the name Method to Remove Columns in R in Ubuntu 20.04

This technique creates a drop character vector in which column names are stored. Later, we instruct R to pick all variables except those indicated in the column drop. Negation is denoted by the “!” symbol. The names() method in R is used to fetch or modify the object’s name. This method accepts the object as a parameter, which can be a matrix, vector, or data frame, as well as the value that will be used to name the object. The length of the value vector supplied must exactly match the object’s length to be named, and it returns all column names.

In the above R script, we have established the data frame inside the variable “df.” The data frame is printed on the screen, and as you can see, this data frame has four fields with distinct entries. Then, we have included the drop command and specified the column names inside the vector. The new variable is created as “new_df,” where the name method is invoked, and to the name() method, we have passed our data frame “df.” Note that we have used the %in% operator after the name method and the drop command on the right of the %in% operator.

The execution of the above code generates the following data frame where the columns y2 and y3 have been removed.

Example # 3: Using the select Method to Remove Columns in R in Ubuntu 20.04

We will use select() in this method by importing the dplyr package into R and specifying the argument to omit the dataset’s columns. In essence, this function merely saves the variables you specify.

We have included the dplyr package to access the select method. Also, we have constructed the data frame from which we can remove the columns. The data frame is generated in the tabular form upon running the Mydata command as the data frame is stored inside this variable. After that, in the next prompt image, we have deployed the select method where the data frame is specified as an argument, and also the column name a1 with the minus sign is provided.

The R prompt interprets this select method command and outputs the data frame, which has column a1 deleted.

Example # 4: Using the select Method to Remove Columns by the Column Positions in R in Ubuntu 20.04

We will pass the column index position as a vector to the select method with a negative sign to remove the column-by-column position, as seen below.

Here, we have included the dplyr module first. And then, we have inserted the R built-in data frame “iris.” The iris data frame is displayed, and it has several columns, as shown in the R prompt. We can remove any of the columns inside the iris data frame by specifying the index of the column. For this, we have the select() method to which we have passed the data frame and the index value to the vector. Column 3,4 and 5 is removed from the data frame iris, which is displayed below.

Example # 5: Using the select Method to Remove Columns by the start and end Character of the Column in R in Ubuntu 20.04

We can also choose columns depending on their beginning and ending characters here. The function starts_with() returns the column that begins with the provided character. To use the start_with() method, we have to follow the syntax provided by the R language, which is select(dataframe,-starts_with(“substring”)). Where dataframe denotes the source, dataframe and substring denote the character or string that precedes it.

We have selected the data frame ToothGrowth in this example as it has three columns that are printed on the screen. Now, in the next step, we have a start_with() method inside the select command. The start_with() method with a minus sign takes the column name that starts with the “dose” in the ToothGrowth data frame. It removed the column “dose” from the data frame when this select command was executed.

The ends_with() is a function that returns the column that contains the given character at the end. The syntax we utilize for the end_with() method is this:select(dataframe,-ends_with(“substring”)). Where dataframe refers to the input dataframe, and substring refers to the character or string that follows it.

Like the above start_with() method, we have passed the column “supp” from the data frame ToothGrowth to the ends_with() method with the minus sign. This method has also removed the column that ends_with the “supp.”

Conclusion

The R programming language was used to demonstrate the approach of removing columns from a data frame. We have seen the R’s built-in subset and name method, which has removed the columns. In R, you can drop a column by using the minus operator before the select method. Also, the select() method in the dplyr module in R is used to choose or remove columns based on the conditions like starts with, ends with, and dropping columns based on location is presented with examples.

Share Button

Source: linuxhint.com

Leave a Reply