| by Arround The Web | No comments

Remove NA in R

“When a production sensor fails, you’ll only be able to collect accurate measurements on four of the assembly line’s six measurement points. However, one of the quality sheet’s marks is illegible. You may be without samples for a whole shift. Therefore, this may influence your statistical computations. Missing data is not handled gracefully by several processes. In this article, we’ll look at a few different techniques to get rid of NA values in R. This permits you to restrict your computations to R data frame rows that meet a specific level of completion.

When no data is available with one or more modules or for an entire unit, it is recognized as lost data. In everyday environments, missing values is a foremost issue. NA (Not Available) entries are used to represent missing records in R. Many datasets come in DataFrame with missing values because they either exist but were not acquired or because they never existed.”

How to Get Rid of the NA Values in the R Programming Language in Ubuntu 20.04?

The symbol NA is used in R to signify missing values (not available). NA can indicate empty values in DataFrame columns in R Programming Language. We will look at how to get rid of NA rows in one column in this article.

Example # 1: Using is.na Method to Remove NA in R in Ubuntu 20.04

We can use is.na to eliminate such NA values from the vector. The na values are obtained using is.na() and the vector index. All values except na will be returned by is.na().

In the example above, we have a vector representation where some random numbers are included along with the NA values. The output also generated the NA value. Thus, we want to remove them. For this, we have called the V1 inside the function is.na, which will eliminate all the existence of NA values in the vectors. The output from this function displays the number only.

Example # 2: Using the na.rm Method to Remove NA in R in Ubuntu 20.04

By evaluating the sum, mean, and variance, we may also remove na values. The na.rm is a method that is used to get rid of na. If na.rm=TRUE, na is ignored; if na.rm=FALSE, na is considered.

So, starting with creating the vector collection, which has some numbers and NA values. This vector collection is stored inside the variable Vec. Then, these NA values are first removed by evaluating the variance represented as var. Then, we evaluated the sum and meant on the Vec to eliminate the NA values. Note that we have na.rm set to TRUE, which will avoid NA in the vector.

Example # 3: Using omit Method to Remove NA in R in Ubuntu 20.04

The omit() method eliminates NA values directly, returning non-NA values and discarded NA values indexes as a result. This is the simplest choice. The na.omit() method returns the result without any na values in any of the rows. In the R language, this is the quickest technique to eliminate na rows.

Here, we have initialized the variable integers with the vectors. Then, with the print command, we have generated the output of the vectors. So, in the output, we have seen some NA values. To remove these NA from the vector, we have the na.omit function, which takes the integers variable as input for removing NA values. After this, we have checked through the print statement whether the NA values are removed from the vectors. When the output id is generated, it shows no NA values in the integers.

Example # 4: Using the complete.cases Method to Remove NA in R in Ubuntu 20.04

For various sorts of analysis of data in the computer language R, a detailed data frame without any missing values is required. The complete.cases method will get this. This r function examines a data frame and returns a result vector of missing values in the rows.

As in the preceding example, we have vector representations. Now, we are eliminating the NA values from the data frames. For this, we have created the data frame inside which, for each column, we have inserted some NA values. Then, we have called the complete.case function that takes the data frame as an input option. The data2 holds this operation which is printed and shows that the NA values are removed.

Example # 5: Using the rowSum Method to Remove NA in R in Ubuntu 20.04

R has the built-in method rowSums, which generates the sums for every row in the data collection in the format of rowSums(x). Additional parameters can be specified, the most significant of which is the Boolean argument of na.rm, which instructs the function whether to skip NA values.

After creating the data frame inside the variable data, we have applied the rowSums method. Within the rowSum, we have is.na method and ncol method. Note that it only removes third-row NA values. As the other rows also contain the NA values.

Example # 6: Using the filter Method to Remove NA in R in Ubuntu 20.04

We can also use the tidyvers dplyr package to drop just rows where all values are missing. Then we can utilize a combination of the dplyr package’s filter function, and Base R’s is.na function. We will show you how to delete only the rows in which all the data entries are NA.

Using the dplyr package for the filter function, we have created the data frame. Then, we have applied the filter function of this data frame and display the output, which has removed the NA values from the third row.

Conclusion

We have learned to remove the na from the R language that appears single or multiple times in the vectors or data frame at this stage in the session. We have covered six methods that help us to remove the na from the given data. These methods are quite easy to implement in the R scripting language, which can remove NA values from the rows and columns too. Also, some methods required the R dplyr package to eliminate the NA.

Share Button

Source: linuxhint.com

Leave a Reply