I am working with the R programming language.
I created some data:
#PART 1
#create data
library(dplyr)
library(caret)
set.seed(123)
salary <- rnorm(1000,5,5)
height <- rnorm(1000,2,2)
my_data = data.frame(salary, height)
#PART 2
#create train and test data
train<-sample_frac(my_data, 0.7)
sid<-as.numeric(rownames(train)) # because rownames() returns character
test<-my_data[-sid,]
#PART 3
salary_quantiles = data.frame( train %>% summarise (quant_1 = quantile(salary, 0.33),
quant_2 = quantile(salary, 0.66),
quant_3 = quantile(salary, 0.99)))
> salary_quantiles
quant_1 quant_2 quant_3
1 3.005188 6.952076 16.98823
Question: Now, I am trying to write an IF STATEMENT which takes the quantiles (3.005188 6.952076 16.98823) and place them into the if statement (I did this manually):
#PART 4
train$salary_type = as.factor(ifelse(train$salary < 3.005188, "A", ifelse( train$salary > 3.005188 & train$salary < 6.952076, "B", "C")))
Does anyone know if there is a way to do this without writing these numbers explicitly? For example:
train$salary_type = as.factor(ifelse(train$salary < salary_quantiles$quant_1 , "A", ifelse( train$salary > salary_quantiles$quant_1 & train$salary < salary_quantiles$quant_2, "B", "C")))
Is this possible to do in R?
Thanks!