conditional mutate across multiple columns

Hi,

I would like to conditionally mutate numeric values across several columns; I have managed to do it and it works, but it looks super clumsy. Is it possible to write this in a nicer way?
Here an extract of the dataset:

I would like to conditionally change the value of all the numerical variables:

If the value is less then 10, I would like it replaced with NA (or ideally leave it blank if possible?)
If the value is more than 33, I would like it replaced with 33,
if the value is between 10and 33 then it should remain the same.

This is what I wrote:

tdata<- rawdata %>%
mutate(T1=ifelse(T1<10, NA, ifelse(T1<33, T1, 33)))%>%
mutate(T2=ifelse(T2<10, NA, ifelse(T2<33, T2, 33)))%>%
mutate(T3=ifelse(T3<10, NA, ifelse(T3<33, T3, 33)))%>%
mutate(C1=ifelse(C1<10, NA, ifelse(C1<33, C1, 33)))%>%
mutate(C2=ifelse(C2<10, NA, ifelse(C2<33, C2, 33)))%>%
mutate(C3=ifelse(C3<10, NA, ifelse(C3<33, C3, 33)))

Any idea how to simplify this?

Cheers

1 Like

Without a reprex, only a sketch will be offered

rawdata %>% mutate(T1 = case_when(
                         T1 < 10 ~ NA,
                         T1 >= 10 & T1 < 33 ~ T1,
                         T1 >= 30 ~ 33)
                  ),
           mutate ... [etc]

has greater legibility. Rather than repeating this code block for all six numeric variables, however, make it into a function and use mutate_if [numeric].

Thanks. How would I make it into function?

Cheers

No assurance can be given that this will work in the absence of a reprex to test it, but the form of a function would be

tri_split <- function(x) {
  mutate(x = case_when(
    x < 10 ~ NA,
    x >= 10 & x < 33 ~ x,
    x >= 30 ~ 33)
  )
}

Thanks, I can't really figure out how to apply that...?

And, without a reprex of your code, I can't either.

I have tried to reprex it and I can't get it to work. Also I can't really post more data than I already have. Not sure why you would need anything else.

Just to be clear, my code works fine:

tdata<- rawdata %>%
mutate(T1=ifelse(T1<10, NA, ifelse(T1<33, T1, 33)))%>%
mutate(T2=ifelse(T2<10, NA, ifelse(T2<33, T2, 33)))%>%
mutate(T3=ifelse(T3<10, NA, ifelse(T3<33, T3, 33)))%>%
mutate(C1=ifelse(C1<10, NA, ifelse(C1<33, C1, 33)))%>%
mutate(C2=ifelse(C2<10, NA, ifelse(C2<33, C2, 33)))%>%
mutate(C3=ifelse(C3<10, NA, ifelse(C3<33, C3, 33)))

It just seems like there should be a more efficient way to write it?

In order to give you a working solution, we need sample data in a copy/paste friendly format (we can't copy from a screenshot)

A proper reproducible example makes it much easier to understand your issue and figuring out how to help, also, it is a polite thing to do when asking questions on programming forums in general.

1 Like

Someone helped me and wrote it this way:

calc1030 = function(x){
x = ifelse(x<10, NA, ifelse(x<33, x, 33))
return(x)
}

tdata<-rawdata %>% mutate_if(is.numeric, calc1030)

It looks much nicer now!

Another alternative would be

tdata<- rawdata %>% mutate(T1 = split_10_33(T1), T2 = split_10_33(T2), T3 = split_10_33(T3),
C1 = split_10_33(C1), C2 = split_10_33(C2), C3 = split_10_33(C3))

But again that's pretty convoluted.

Thank you anyway.

1 Like

Sure no worries. I will try again next time. It keeps coming up with this error:

Rendering reprex...
pandoc.exe: \: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1

And I didn't really have time to deal with this on top of my actual problem.

Cheers for your help anyhow.

Using the reprex package makes your reproducible example look nicer, but it is not mandatory for providing one, it would be enough to provide sample data in a copy/paste friendly format (with dput() or datapasta::df_paste()) and the minimal runnable code that reproduces your issue.

1 Like

Not all of the data is needed. It doesn't even have to be real data. It just needs to illustrate the issue.

This has benefits:

  1. It eliminates most sources of confusion.
  2. A proposed solution can be illustrated.
  3. It puts the burden where it belongs: on the person asking the question.
  4. Questions that don't require reverse engineering attract more answers
1 Like

The solution offered by PossumHound is great, but for completeness, the mutate_if() and in general the scoped versions of mutate() and summarise() have been superseded by the across() function. So, in this new regime, PossumHound's solution would be:

tdata<-rawdata %>% mutate(across(where(is.numeric), calc1030))
2 Likes

Dave_Armstrong's answer is this is the correct and succinct way to do this.

The below is Dave's answer using the case_when() statement w/o assigning it to a function.

df %>% 
  summarise(
    across(where(is.numeric), 
           ~ case_when(
              .x < 10 ~ NA,
              .x >= 10 & .x < 33 ~ .x,
              .x >= 30 ~ 33)
    )
  )
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.