 # Multiplication by levels

Hi, I am stuck with this problem

I want to multiply my data but each proportion of each observation for a different percentage.
As example: if the first observation of my Var_1 has 5000 value.
I want to multiply the proportion between 100 and 1000 by 2% (in this particular case 900 x 2%).
The proportion between 1000 and 2000 by 3% (in this case 1000 x 3%).
And the proportion >2000 by 5% (in this case 3000 x 5%).

And add the sum of this process for each observation in a new variable.
Any idea of how to proceed?

I have these data like example:

``````library(tidyverse)
my_data <- tibble(Var_1 = c(5000, 1500, 350, 1200, 750, 1000,1250, 2500))
``````
1 Like

We can write a function using basic `if...else` flow control statements. The function can then be applied to each element of `Var_1`.

Note: You haven't specified what should happen to numbers smaller than 100. This function just returns 0 for such values.

``````library(tidyverse)

my_data <- tibble(Var_1 = c(5000, 1500, 350, 1200, 750, 1000, 1250, 2500))

foo <- function(x) {
if (x <= 100) {
return (0)
} else {
if (x <= 1000) {
return ((x - 100) * 0.02)
} else if (x <= 2000) {
return ((900 * 0.02) + ((x - 1000) * 0.03))
} else {
return ((900 * 0.02) + (1000 * 0.03) + ((x - 2000) * 0.05))
}
}
}

my_data\$Var_2 <- map_dbl(my_data\$Var_1, foo)

print(my_data)
#> # A tibble: 8 x 2
#>   Var_1 Var_2
#>   <dbl> <dbl>
#> 1  5000 198
#> 2  1500  33
#> 3   350   5
#> 4  1200  24
#> 5   750  13
#> 6  1000  18
#> 7  1250  25.5
#> 8  2500  73
``````

Created on 2020-05-19 by the reprex package (v0.3.0)

1 Like

If you pre-calculate all the intervals and then only calculate the remainder in each interval for each value, I think you can do this safely in a vectorised fashion without requiring branching. This will then also be extendable to adding as many cutoffs and rates as required without having to adjust the body of the function.

``````propmult <- function(x, rate, coff) {
spot <- findInterval(x, coff)
base <- cumsum(rate * c(diff(coff), 0))[spot-1]
base + ((x - coff[spot]) * rate[spot])
}
``````

``````propmult(my_data\$Var_1, rate = c(0, 0.02, 0.03, 0.05), coff = c(0, 100, 1000, 2000))
## 198.0  33.0   5.0  24.0  13.0  18.0  25.5  73.0
map_dbl(my_data\$Var_1, foo)
## 198.0  33.0   5.0  24.0  13.0  18.0  25.5  73.0
``````

As a side benefit, it should also be quick to run on huge data if that is a concern:

``````datbig <- my_data[rep(1:8,1e6),]
nrow(datbig)
# 8 million rows - 8,000,000 - to test it out
system.time({ map_dbl(datbig\$Var_1, foo) })
##   user  system elapsed
##  5.049   0.092   5.140

system.time({propmult(datbig\$Var_1, rate = c(0, 0.02, 0.03, 0.05), coff = c(0, 100, 1000, 2000))})
##   user  system elapsed
##  0.138   0.000   0.138
``````
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.