removing blanks/NA's

How would you:

  1. Remove NA rows from the dataset
  2. Remove blank rows from the dataset

Could you elaborate more on what do you mean by "blank rows"?, ideally, could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

remove:

  1. rows that explicitly say "N/A" or "Null"
  2. rows that are missing a value

You are still not being clear but maybe this could give you a clue

library(dplyr)
df <- data.frame(stringsAsFactors = FALSE,
                 x = c(NA, "N/A", "null", "", "1", "2", "3"),
                 y = 1:7) 
df
#>      x y
#> 1 <NA> 1
#> 2  N/A 2
#> 3 null 3
#> 4      4
#> 5    1 5
#> 6    2 6
#> 7    3 7
df %>% 
    mutate_all(~ifelse(. %in% c("N/A", "null", ""), NA, .)) %>% 
    na.omit()
#>   x y
#> 5 1 5
#> 6 2 6
#> 7 3 7

Created on 2019-04-06 by the reprex package (v0.2.1.9000)

1 Like

yes, this is helpful.

Can you explain what the bolded means

df %>%
mutate_all(**~**ifelse(. %in% c("N/A", "null", ""), NA, .)) %>%
na.omit()

df is the name of the sample dataset I have created to exemplify the solution

df <- data.frame(stringsAsFactors = FALSE,
                 x = c(NA, "N/A", "null", "", "1", "2", "3"),
                 y = 1:7) 
df
#>      x y
#> 1 <NA> 1
#> 2  N/A 2
#> 3 null 3
#> 4      4
#> 5    1 5
#> 6    2 6
#> 7    3 7

NA stand for Not Available, and is the way of R to represent missing values, any other form is treated as a character string i.e. c("N/A", "null", "")

%>% this is called the pipe operator and concatenates commands together to make code more readable, the previous code would be equivalent to

na.omit(mutate_all(df, ~ifelse(. %in% c("N/A", "null", ""),  NA, .)))
2 Likes

Thank you!! Got it, you are assigning N/A, nulls and blanks as "NA", which R recognizes as a missing value. Then you are omitting these values.

Final questions about the syntax:

  • What does the ~ mean before the ifelse
  • What do the periods (.) before the %in% and after the NA, mean?

This means I'm passing a quosure style lambda ~ function.

The dot represents any variable in the dataframe

Try reading the documentation for the function to get a better understanding.

?mutate_all

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.