Define the function

Hello everybody!
I have to understand this code and don't know how to define this cut2.. the outcome said couldn't find the function. How do I do it? I have to work myself in in R again. I am very happy about your help :slight_smile: !

1 Like

I totaly got the same problem. Any help out here??

Please donot post the image, Instead post the script as code. Please read up how to ask queries here.

1 Like

Hi Anna,

One place to start would be look at what packages were used to build the code. Do you know?

1 Like

Sure, I'm sorry, this is the code (the cut2 is down under "IV for numeric data"):

data<-read.csv("Data_Prediction_loan.csv",header = TRUE)
data1=data #To create a backup of original data
head(data1)

#------------------------------------Basic Exploration of the data--------------------------------------------#
str(data1)
summary(data1)
dim(data1)
data1$SeniorCitizen<-as.factor(data1$SeniorCitizen)
str(data1)

#-----------------------------------Missing Value Treatment (if any)-------------------------------------------#
data.frame(colSums(is.na(data1)))

#---->Substituting missing values with mean

data1[is.na(data1$TotalCharges),19]=mean(data1$TotalCharges,na.rm=T)

data.frame(colSums(is.na(data1)))

#--------------------------------Information Value Calculation (A variable reduction technique)----------------------------------#

#-----------> Creating two data sets for numeric and categorical values

Data set with numeric variable

num <- data1[,-c(1:4,6:17)]#Numerical Data Frame
cat <- data1[,c(1:4,6:17,20)]#Categorical Data Frame
head(cat)
head(num)
str(num)
str(cat)

#---------------------------------------IV for numeric data-------------------------------------------------------#

IVCal <- function(variable,target,data,groups)
{
data[,"rank"] <- cut2(data[,variable],g=groups)
tableOutput <-sqldf(sprintf("select rank,
count(%s) n,
sum(%s) good
from data
group by rank",target,target))
tableOutput <- sqldf("select *,
(n - good) bad
from tableOutput")
tableOutput$bad_rate<- tableOutput$bad/sum(tableOutput$bad)*100
tableOutput$good_rate<- tableOutput$good/sum(tableOutput$good)*100
tableOutput$WOE<- (log(tableOutput$good_rate/tableOutput$bad_rate))100
tableOutput$IV <- (log(tableOutput$good_rate/tableOutput$bad_rate))
(tableOutput$good_rate-tableOutput$bad_rate)/100
IV <- sum(tableOutput$IV[is.finite(tableOutput$IV)])
IV1 <- data.frame(cbind(variable,IV))
return(IV1)
}

a1<- IVCal("tenure","Churn",num,groups=10)
a2<- IVCal("MonthlyCharges","Churn",num,groups=10)
a3<- IVCal("TotalCharges","Churn",num,groups=10)

IV_num<- data.frame(rbind(a1,a2,a3))
IV_num

And these are the packages i installed:
list.of.packages <- c("caret", "ggplot2", "MASS", "car", "mlogit", "caTools", "sqldf"," Hmisc", "aod", "BaylorEdPsych", "ResourceSelection", "pROC", "ROCR")

Does this cut2 have something to do with the packages?

If limit how much you post, it's easier to find your answer, so this:

would have been just right.

Do you know how to search for documentation in R? Are you familiar with the help functions ? and ?? ?

No, I'm not. Then I will first inform me about that. Thanks!

To give you an example from the code you shared:

If you run the command ?read.csv, the documentation for the read.csv() function will appear in the "Help" tab in the lower right. That works because read.csv() is part of the core of R and so is available every time you open RStudio.

However, if the function you're looking for is part of a package, you first need to run library([name of package]) to load the package so that the function name is made available to you, and then run ?[name of function].

Lastly, if ? doesn't work and you're not sure what package the function is from, you can run ??[name of function]. That will produce a collection of links in the "Help" tab to documentation for similarly named functions.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.