Use mutate_at with nested ifelse

adpatter · January 5, 2018, 2:03am

This will make values, which are not in columnA, NA given the conditions (using %>%).

mutate_at(vars(-columnA), funs(((function(x) {
if (is.logical(x))
  return(x)
else if (!is.na(as.numeric(x)))
  return(as.numeric(x))
else
  return(NA)
})(.))))

How can I achieve the same result using mutate_at and nested ifelse?

For example, this does not produce the same result:

mutate_at(vars(-columnA),funs(ifelse(is.logical(.),.,
ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))

Update 2018-1-5

I am learning R, hence the problem with the question. I will update this question with an explanation once I have a better understanding of the language.

Thank you.

danr · January 5, 2018, 11:51am

It would help if you posted a reprex showing input and output. It makes it a lot easier us to give you help.

https://www.tidyverse.org/help/#reprex

danr · January 5, 2018, 3:45pm

Is this reprex below showing the results you saw?

We really need a better idea of what you are trying to do in terms of input and expected output to help you get the results you want.

The thing I think you running into is that your logical test in your second mutate_at is testing a column, i.e. a vector, not the individual element in the column.

Here is what I think you wanted to show and a trimmed down example to showing the difference is in testing individual elements in a column and testing the whole column itself.

Notice that there are a number of warnings about testing the first element of a vector. That's another reason for posting a reprex... we all could have seen that warnings occurred. Those warnings are a clue as to the differences you are seeing.

suppressPackageStartupMessages(library(dplyr))

# does this produce the difference you are talking about?
tbl <- tibble::tribble (
    ~columnA, ~x, ~y, ~z,
    1, "sadf", 8, FALSE,
    2, "ssadf", 19, FALSE,
    3, "sssadf", 10, TRUE
)

tbl %>% mutate_at(vars(-columnA), funs(((function(x) {
    if (is.logical(x))
        return(x)
    else if (!is.na(as.numeric(x)))
        return(as.numeric(x))
    else
        return(NA)
})(.))))
#> Warning in (function(x) {: NAs introduced by coercion
#> Warning in if (!is.na(as.numeric(x))) return(as.numeric(x)) else
#> return(NA): the condition has length > 1 and only the first element will be
#> used

#> Warning in if (!is.na(as.numeric(x))) return(as.numeric(x)) else
#> return(NA): the condition has length > 1 and only the first element will be
#> used
#> # A tibble: 3 x 4
#>   columnA     x     y     z
#>     <dbl> <lgl> <dbl> <lgl>
#> 1       1    NA     8 FALSE
#> 2       2    NA    19 FALSE
#> 3       3    NA    10  TRUE

# the reason this is different than the preceeding is
# that the logical test is being done on a vector
# so only the the same first element in the vector is being
# tested for each element that is processed.
# The result is that all the output rows are the
# same because they are all based on a test of the 
# first element in each column

tbl %>% mutate_at(vars(-columnA),funs(
    ifelse(is.logical(.),.,
        ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))
#> Warning in ifelse(!is.na(as.numeric(x)), as.numeric(x), NA): NAs introduced
#> by coercion
#> # A tibble: 3 x 4
#>   columnA     x     y     z
#>     <dbl> <lgl> <dbl> <lgl>
#> 1       1    NA     8 FALSE
#> 2       2    NA     8 FALSE
#> 3       3    NA     8 FALSE

# trimmed down equivalent example to make it easier to look at


# here the logical test is being done on each element in the column

tbl %>% mutate_at(vars(-columnA), function(x) { if(is.logical(x)) x else NA})
#> # A tibble: 3 x 4
#>   columnA     x     y     z
#>     <dbl> <lgl> <lgl> <lgl>
#> 1       1    NA    NA FALSE
#> 2       2    NA    NA FALSE
#> 3       3    NA    NA  TRUE

# here the logical test is being done on the whole column,
# not each element in the column
# . is the column
tbl %>% mutate_at(vars(-columnA),funs(ifelse(is.logical(.), ., NA)))
#> # A tibble: 3 x 4
#>   columnA     x     y     z
#>     <dbl> <lgl> <lgl> <lgl>
#> 1       1    NA    NA FALSE
#> 2       2    NA    NA FALSE
#> 3       3    NA    NA FALSE

danr · January 5, 2018, 4:18pm

You don't need to have a deep understanding of R to post here. If you can just break down the problem a bit I think you will find this community a great place to learn R no matter what level you are at.

Learning to use reprex's is a great way to "try out" things when you are learning R. reprex's can be a bit intimidating when you are just learning R but please post questions about them here... there are a lot of people here willing to help you out.

adpatter · January 5, 2018, 6:23pm

Danr:

Thank you again. I am grateful for the detailed response. Your examples helped me understand the language better.

This is what I had intended to write:

  mutate_at(vars(-columnA), funs(((function(x) {
    for(i in 1:length(x))
    {
      if(!is.na(as.numeric(x[i])) && !is.logical(x[i])) 
      {
        x[i] <- as.numeric(x[i]);
      }
      else if(!is.na(x[i]))
      {
        x[i] <- NA
      }
    }
    return(x)    
  })(.))))

This is a better solution:

  mutate_at(vars(-columnA), function(x) {
      if(is.logical(x)) 
        return(x)
    
        return(as.numeric(x))
  })

You wrote, "# here the logical test is being done on each element in the column
tbl %>% mutate_at(vars(-columnA), function(x) { if(is.logical(x)) x else NA})"

Are you certain that the test is being done on each element in the column? This seems to indicate that the test is being done on the column:

> tbl <- tibble::tribble (
+   ~columnA, ~x, ~y, ~z,
+   1, "sadf", 8, FALSE,
+   2, "ssadf", 19, FALSE,
+   3, "sssadf", 10, TRUE
+ )

> tbl %>% mutate_at(vars(-columnA), function(x) { 
+   message(is.logical(x))
+   if(is.logical(x)) x else NA
+   })
FALSE
FALSE
TRUE
# A tibble: 3 x 4
  columnA x     y     z    
    <dbl> <lgl> <lgl> <lgl>
1    1.00 NA    NA    F    
2    2.00 NA    NA    F    
3    3.00 NA    NA    T

It appears that the test is being done on the column and it is either returning the entire column or NA. Do you agree?

danr · January 5, 2018, 7:16pm

By each element I meant each element in the column. For example if you look at the z column your example it contains F, F, T. If the test was being one on the column itself it would test the first element in the column once for each element in the column and as a result would produce the same result for each element in the column.

Here is an example of using if to test a column:

l <- c(F,F,F)

if(l) "T" else "F"
#> Warning in if (l) "T" else "F": the condition has length > 1 and only the
#> first element will be used
#> [1] "F"

l <- c(T,F,F)

if(l) "T" else "F"
#> Warning in if (l) "T" else "F": the condition has length > 1 and only the
#> first element will be used
#> [1] "T"

Notice that when F is the first element in the list the if is false but when T is the first element in the list the if is true. Also there is a warning that that is what is happening. Sometimes that is a useful behavior but most of the time it is an unwanted surprise.

One of the gotcha's of R is that not all functions treat lists this way.