One option to make this less painful is to use the cut function. For example:
# Fake data
set.seed(2)
y = c(0, runif(50, 0, 3))
breaks = c(-Inf, 0, 0.05, 0.25, 0.5, 1, 1.5, 2.5, Inf)
labels = c(1:4,-3:(-6))
# Returns default category labels as factors
cut(y, breaks=breaks, include.lowest=TRUE)
# Returns desired labels, but as factors
cut(y, breaks=breaks, labels=labels, include.lowest=TRUE)
# Convert factor labels to numeric
y_score = as.numeric(as.character(cut(y, breaks=breaks, labels=labels, include.lowest=TRUE)))
y_score
[1] 1 -3 -5 -5 -3 -6 -6 4 -6 -4 -5 -5 -3 -5 -3 -4 -6 -6 -3
[20] -4 3 -5 -4 -6 4 -4 -4 4 -4 -6 4 2 4 -5 -6 -5 -5 -6
[39] -3 -5 4 -6 -3 4 4 -6 -5 -6 -4 -5 -5
For your actual use case, you'll need to create the vectors of breaks and labels based on how you actually want to recode the values. Also, the cut function allows you to decide whether you want each interval closed on the left or the right. See the help (?cut) for details. The cut function returns a vector of factor class. as.numeric(as.character(... converts the labels back to numeric so that you can use them in further calculations.
To turn this into a function:
wellness_z = function(x) {
breaks = c(0, 0.01, 0.05, 0.1, 0.5, 1, 1.5, 2.5, Inf)
labels = c(1:4,-3:(-6))
as.numeric(as.character(cut(x, breaks=breaks, labels=labels, include.lowest=TRUE)))
}
wellness_z(y)
I think cut is a better approach for your use case, but here are a few examples with ifelse and if.
The if/else approach would be something like the following, but with additional nested ifelse statements for each range you want to assign to a value. ifelse is "vectorized" meaning that it operates separately on every element of the input vector:
wellness_z = function(x) {
ifelse(x==0, 1,
ifelse(x > 0 & x < 0.5, 2,
ifelse(x >= 0.5 & x < 1, 3, 4)))
}
wellness_z(y)
The if statement approach doesn't work, because if is not vectorized; it looks only at the first value of the input vector:
wellness_z = function(x) {
if(x==0) {
1
} else if(x > 0 & x < 0.5) {
2
} else if(x >= 0.5 & x < 1) {
3
} else {
4
}
}
wellness_z(y)
[1] 1
Warning message:
In if (x == 0) { :
the condition has length > 1 and only the first element will be used
However, you can use the base R sapply function or the purrr map function to operate on each element individually:
sapply(y, wellness_z)
purrr::map_dbl(y, wellness_z)