 # conditional mutate across multiple columns

Hi,

I would like to conditionally mutate numeric values across several columns; I have managed to do it and it works, but it looks super clumsy. Is it possible to write this in a nicer way?
Here an extract of the dataset:

I would like to conditionally change the value of all the numerical variables:

If the value is less then 10, I would like it replaced with NA (or ideally leave it blank if possible?)
If the value is more than 33, I would like it replaced with 33,
if the value is between 10and 33 then it should remain the same.

This is what I wrote:

tdata<- rawdata %>%
mutate(T1=ifelse(T1<10, NA, ifelse(T1<33, T1, 33)))%>%
mutate(T2=ifelse(T2<10, NA, ifelse(T2<33, T2, 33)))%>%
mutate(T3=ifelse(T3<10, NA, ifelse(T3<33, T3, 33)))%>%
mutate(C1=ifelse(C1<10, NA, ifelse(C1<33, C1, 33)))%>%
mutate(C2=ifelse(C2<10, NA, ifelse(C2<33, C2, 33)))%>%
mutate(C3=ifelse(C3<10, NA, ifelse(C3<33, C3, 33)))

Any idea how to simplify this?

Cheers

1 Like

Without a `reprex`, only a sketch will be offered

``````rawdata %>% mutate(T1 = case_when(
T1 < 10 ~ NA,
T1 >= 10 & T1 < 33 ~ T1,
T1 >= 30 ~ 33)
),
mutate ... [etc]
``````

has greater legibility. Rather than repeating this code block for all six numeric variables, however, make it into a function and use `mutate_if` [numeric].

No assurance can be given that this will work in the absence of a `reprex` to test it, but the form of a function would be

``````tri_split <- function(x) {
mutate(x = case_when(
x < 10 ~ NA,
x >= 10 & x < 33 ~ x,
x >= 30 ~ 33)
)
}
``````

In order to give you a working solution, we need sample data in a copy/paste friendly format (we can't copy from a screenshot)

A proper reproducible example makes it much easier to understand your issue and figuring out how to help, also, it is a polite thing to do when asking questions on programming forums in general.

1 Like

Someone helped me and wrote it this way:

calc1030 = function(x){
x = ifelse(x<10, NA, ifelse(x<33, x, 33))
return(x)
}

tdata<-rawdata %>% mutate_if(is.numeric, calc1030)

It looks much nicer now!

Another alternative would be

tdata<- rawdata %>% mutate(T1 = split_10_33(T1), T2 = split_10_33(T2), T3 = split_10_33(T3),
C1 = split_10_33(C1), C2 = split_10_33(C2), C3 = split_10_33(C3))

But again that's pretty convoluted.

Thank you anyway.

1 Like

Sure no worries. I will try again next time. It keeps coming up with this error:

Rendering reprex...
pandoc.exe: \: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1

And I didn't really have time to deal with this on top of my actual problem.

Using the `reprex` package makes your reproducible example look nicer, but it is not mandatory for providing one, it would be enough to provide sample data in a copy/paste friendly format (with `dput()` or `datapasta::df_paste()`) and the minimal runnable code that reproduces your issue.

1 Like

Not all of the data is needed. It doesn't even have to be real data. It just needs to illustrate the issue.

This has benefits:

1. It eliminates most sources of confusion.
2. A proposed solution can be illustrated.
3. It puts the burden where it belongs: on the person asking the question.
4. Questions that don't require reverse engineering attract more answers
1 Like

The solution offered by PossumHound is great, but for completeness, the `mutate_if()` and in general the scoped versions of `mutate()` and `summarise()` have been superseded by the `across()` function. So, in this new regime, PossumHound's solution would be:

``````tdata<-rawdata %>% mutate(across(where(is.numeric), calc1030))
``````
2 Likes

Dave_Armstrong's answer is this is the correct and succinct way to do this.

The below is Dave's answer using the case_when() statement w/o assigning it to a function.

``````df %>%
summarise(
across(where(is.numeric),
~ case_when(
.x < 10 ~ NA,
.x >= 10 & .x < 33 ~ .x,
.x >= 30 ~ 33)
)
)
``````
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Thanks. How would I make it into function?

Cheers

And, without a `reprex` of your code, I can't either.

Thanks, I can't really figure out how to apply that...?

I have tried to reprex it and I can't get it to work. Also I can't really post more data than I already have. Not sure why you would need anything else.

Just to be clear, my code works fine:

tdata<- rawdata %>%
mutate(T1=ifelse(T1<10, NA, ifelse(T1<33, T1, 33)))%>%
mutate(T2=ifelse(T2<10, NA, ifelse(T2<33, T2, 33)))%>%
mutate(T3=ifelse(T3<10, NA, ifelse(T3<33, T3, 33)))%>%
mutate(C1=ifelse(C1<10, NA, ifelse(C1<33, C1, 33)))%>%
mutate(C2=ifelse(C2<10, NA, ifelse(C2<33, C2, 33)))%>%
mutate(C3=ifelse(C3<10, NA, ifelse(C3<33, C3, 33)))

It just seems like there should be a more efficient way to write it?