Rowwise mutate ifelse returns row index of desired feature, not the feature itself

soedr · February 3, 2019, 8:40am

Hi there,

I have two dataframes pool_performance, to which I need to add info from payout_scheme based on data in pool_performance. Using rowwise mutate ifelse, the operation returns the row index of the value I'm looking for in payout_scheme, as opposed to the desired value in payout_scheme$policy_name.

What is wrong with the below?

Thanks in advance!

library(tidyverse)
payout_scheme <- data.frame(
 a_rate_floor = rep(c(.7,.76,.81,.86,.91,.96),2),
 a_rate_ceil = rep(c(.75,.8,.85,.9,.95,1.0),2),
 a_count_floor = c(rep(36,6),rep(41,6)),
 a_count_ceil = c(rep(40,6),rep(45,6)),
 payout = c(250,280,300,325,350,420,320,395,540,570,650,700), 
 policy_name = c(paste0("A",seq(1,6,1)),paste0("B",seq(1,6,1))) 
)

set.seed(12345)
pool_performance <- data.frame(
 id = seq(1,36,1),
 a_rate = runif(36, min = .5, max = .9),
 a_count = floor(runif(36, min = 35, max = 44))
 ) %>%
 mutate(
  a_rate_hit = ifelse(a_rate > min(payout_scheme$a_rate_floor),1,0),
  a_count_hit = ifelse(a_count > min(payout_scheme$a_count_floor),1,0)
 ) %>%
 rowwise() %>%
 mutate(
  qualified = ifelse(a_rate_hit == 1 && a_count_hit == 1,1,0),
  policy_1 = ifelse(qualified == 1, payout_scheme$policy_name[which(
   payout_scheme$a_rate_floor<a_rate
   & payout_scheme$a_rate_ceil>a_rate
   & payout_scheme$a_count_floor <a_count
   & payout_scheme$a_count_ceil >a_count)],NA)
 )

Applying the same logic outside the chained mutate command returns desired values; one example below.

payout_scheme$policy_name[which(
 payout_scheme$a_rate_floor<pool_performance$a_rate[1]
 & payout_scheme$a_rate_ceil>pool_performance$a_rate[1]
 & payout_scheme$a_count_floor < pool_performance$a_count[1]
 & payout_scheme$a_count_ceil > pool_performance$a_count[1])]

cderv · February 3, 2019, 10:32am

I think this is here because you are trying to manipulate factor column using index. However, factor works with levels and I think you don't get a correct result because of that. Factors are not the easiest object to deal with. If you do need, look at forcats .

The thing is you did not ask explicitly for factors but data.frame will convert any character column to factors at creation, unless you add stringAsFactors = FALSE.

You can also use tibble to be more stable.

I did not check if this is correct but replacing with tibble I get policy names

library(tidyverse)
#> Warning: le package 'tibble' a été compilé avec la version R 3.5.2
#> Warning: le package 'purrr' a été compilé avec la version R 3.5.2
payout_scheme <- tibble(
  a_rate_floor = rep(c(.7,.76,.81,.86,.91,.96),2),
  a_rate_ceil = rep(c(.75,.8,.85,.9,.95,1.0),2),
  a_count_floor = c(rep(36,6),rep(41,6)),
  a_count_ceil = c(rep(40,6),rep(45,6)),
  payout = c(250,280,300,325,350,420,320,395,540,570,650,700), 
  policy_name = c(paste0("A",seq(1,6,1)),paste0("B",seq(1,6,1)))
)

set.seed(12345)
pool_performance <- tibble(
  id = seq(1,36,1),
  a_rate = runif(36, min = .5, max = .9),
  a_count = floor(runif(36, min = 35, max = 44))
)

pool_performance %>%
  mutate(
    a_rate_hit = ifelse(a_rate > min(payout_scheme$a_rate_floor),1,0),
    a_count_hit = ifelse(a_count > min(payout_scheme$a_count_floor),1,0)
  ) %>%
  rowwise() %>%
  mutate(
    qualified = ifelse(a_rate_hit == 1 && a_count_hit == 1,1,0),
    policy_1 = ifelse(qualified == 1, payout_scheme$policy_name[which(
      payout_scheme$a_rate_floor<a_rate
      & payout_scheme$a_rate_ceil>a_rate
      & payout_scheme$a_count_floor <a_count
      & payout_scheme$a_count_ceil >a_count)],NA)
  )
#> Source: local data frame [36 x 7]
#> Groups: <by row>
#> 
#> # A tibble: 36 x 7
#>       id a_rate a_count a_rate_hit a_count_hit qualified policy_1
#>    <dbl>  <dbl>   <dbl>      <dbl>       <dbl>     <dbl> <chr>   
#>  1     1  0.788      42          1           1         1 B2      
#>  2     2  0.850      43          1           1         1 <NA>    
#>  3     3  0.804      40          1           1         1 <NA>    
#>  4     4  0.854      36          1           0         0 <NA>    
#>  5     5  0.683      42          0           1         0 <NA>    
#>  6     6  0.567      38          0           1         0 <NA>    
#>  7     7  0.630      43          0           1         0 <NA>    
#>  8     8  0.704      41          1           1         1 <NA>    
#>  9     9  0.791      37          1           1         1 A2      
#> 10    10  0.896      37          1           1         1 A4      
#> # ... with 26 more rows

^{Created on 2019-02-03 by the reprex package (v0.2.1)}

This answer is just trying to explain why you have integer instead of value.
If I take the time to understand your task I think I would have done it with joins and filtering, instead

soedr · February 3, 2019, 10:36am

Thank you @cderv, you're spot on! adding stringsAsFactors=F returns the correct values.

system · February 10, 2019, 10:37am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.