removing blanks/NA's

#1

How would you:

  1. Remove NA rows from the dataset
  2. Remove blank rows from the dataset
0 Likes

#2

Could you elaborate more on what do you mean by "blank rows"?, ideally, could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

0 Likes

#3

remove:

  1. rows that explicitly say "N/A" or "Null"
  2. rows that are missing a value
0 Likes

#4

You are still not being clear but maybe this could give you a clue

library(dplyr)
df <- data.frame(stringsAsFactors = FALSE,
                 x = c(NA, "N/A", "null", "", "1", "2", "3"),
                 y = 1:7) 
df
#>      x y
#> 1 <NA> 1
#> 2  N/A 2
#> 3 null 3
#> 4      4
#> 5    1 5
#> 6    2 6
#> 7    3 7
df %>% 
    mutate_all(~ifelse(. %in% c("N/A", "null", ""), NA, .)) %>% 
    na.omit()
#>   x y
#> 5 1 5
#> 6 2 6
#> 7 3 7

Created on 2019-04-06 by the reprex package (v0.2.1.9000)

0 Likes

#5

yes, this is helpful.

Can you explain what the bolded means

df %>%
mutate_all(**~**ifelse(. %in% c("N/A", "null", ""), NA, .)) %>%
na.omit()

0 Likes

#6

df is the name of the sample dataset I have created to exemplify the solution

df <- data.frame(stringsAsFactors = FALSE,
                 x = c(NA, "N/A", "null", "", "1", "2", "3"),
                 y = 1:7) 
df
#>      x y
#> 1 <NA> 1
#> 2  N/A 2
#> 3 null 3
#> 4      4
#> 5    1 5
#> 6    2 6
#> 7    3 7

NA stand for Not Available, and is the way of R to represent missing values, any other form is treated as a character string i.e. c("N/A", "null", "")

%>% this is called the pipe operator and concatenates commands together to make code more readable, the previous code would be equivalent to

na.omit(mutate_all(df, ~ifelse(. %in% c("N/A", "null", ""),  NA, .)))
1 Like

#7

Thank you!! Got it, you are assigning N/A, nulls and blanks as "NA", which R recognizes as a missing value. Then you are omitting these values.

Final questions about the syntax:

  • What does the ~ mean before the ifelse
  • What do the periods (.) before the %in% and after the NA, mean?
0 Likes

#8

This means I'm passing a quosure style lambda ~ function.

The dot represents any variable in the dataframe

Try reading the documentation for the function to get a better understanding.

?mutate_all
0 Likes

closed #9

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes