issue with assigning variable and not calculating logical correctly in column

I know I'm probably just not seeing it, but I can't seem to get this to work or understand why it's doing this on each outcome. What I have is a data frame with several columns that are named percent. So ex. percentone percenttwo...etc. There's quite a few and I need to do several calculations in later steps on each one separately. They are all class numeric. What I would like the outcome to be is a logical, showing list of all the values from each row in the data frame. I did this step on a previous column in a different dataset and it worked just fine, so I'm not sure what I'm doing wrong. Overall I'm trying to identify the values that are greater than 100. I'm using RStudio 2021.09.0 Build 351, R version 4.1.1 (2021-08-10), dplyr 1.0.7. Thank you for any help.

mydf <- data.frame(
                "percentone" = as.numeric(c(65, 75, 102)),
                "percenttwo" = as.numeric(c(103, 104, 89))
)
mydf

# percentone percenttwo
# 1         65        103
# 2         75        104
# 3        102         89

newmydf <- mydf %>%
  mutate(x = percentone > 100)
newmydf

# below is what it should do

# percentone percenttwo     x
# 1         65        103        FALSE
# 2         75        104        FALSE
# 3        102         89        TRUE

# but when I set variable to replace it below

percent <- "percentone"

newmydf <- mydf %>%
  mutate(x =  percent > 100 )
newmydf

# percentone percenttwo    x
# 1         65        103 TRUE
# 2         75        104 TRUE
# 3        102         89 TRUE

newmydf <- mydf %>%
  mutate(x =  paste(percent) > 100 )
newmydf

# percentone percenttwo    x
# 1         65        103 TRUE
# 2         75        104 TRUE
# 3        102         89 TRUE

newmydf <- mydf %>%
  mutate(x = case_when(
      percent > 100 ~ TRUE
  ))
newmydf

# percentone percenttwo    x
# 1         65        103 TRUE
# 2         75        104 TRUE
# 3        102         89 TRUE

newmydf <- mydf$percent > 100
newmydf

# logical(0)


# So what I'd like is an output that gives me class logical with FALSE FALSE TRUE


Hi @jasongeslois

You can try this:

library(tidyverse)
mydf <- data.frame(
  "percentone" = as.numeric(c(65, 75, 102)),
  "percenttwo" = as.numeric(c(103, 104, 89))
)

percent <- "percentone"

mydf %>%
  mutate(x =  eval(parse(text = percent)) > 100 )
#>   percentone percenttwo     x
#> 1         65        103 FALSE
#> 2         75        104 FALSE
#> 3        102         89  TRUE

You can also consider a function, or add test columns for all relevant columns in one go, like this for instance:

mydf %>% 
  mutate(across(starts_with("percent"), ~. > 100, .names = "{.col}_test"))
#>   percentone percenttwo percentone_test percenttwo_test
#> 1         65        103           FALSE            TRUE
#> 2         75        104           FALSE            TRUE
#> 3        102         89            TRUE           FALSE

Hope it helps.

1 Like

The cause of your problem is that mutate() uses non-standard evaluation. Typically, it would be the name of a column in mydf, i.e. there would be a column named percent. On not finding a percent column, it seems that mutate() is comparing "percentone" to 100, and always getting TRUE.

"percentone" > 100
[1] TRUE

The following will work.

percent <- "percentone"

newmydf <- mydf %>%
  mutate(x =  mydf[[percent]] > 100 )

However, this code does not gain you anything. Can you explain your larger goal. Do you want to write a function that does this comparison for a name passed to it?

1 Like

Yes writing a function was my end goal, so that I could pass the name to it, have it be replaced in the several spots of the code that would be inside the function, and then generate the plots I need for each one. So that was where I found that issue when I was trying to work out making the function. Thank you again.

Thank you, those solutions, do work, and a function is my end goal for it to be able to fill in several other steps, to ultimately generate several plots from all the steps using that column. Thank you again.

I most often find myself using the following syntax for metaprogramming.

mydf %>%
  mutate(x = !!sym(percent) > 100 )
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.