Decision trees with Two .CSV files for Auto Insurance

Here is the output I need to see:

Here is my code that worked so far:

install.packages("rpart")

library(part)

Here are the two lines of code I attempted without success and the corresponding error messages:

PolicyTree <-rpart(Insurance_Category ~ Gender + Age + Marital_Status + Account_Activity + Claim_12Mo + Accident_36Mo + Ticket_12Mo +Payment_Method, mehod = 'class')

Error in eval(predvars, data, env) :
object 'Insurance_Category' not found

PolicyBuyers<-rpart(Insurance_Category ~ Gender + Age + Marital_Status + Account_Activity + Claim_12Mo + Accident_36Mo + Ticket_12Mo +Payment_Method, mehod = 'class')

Error in eval(predvars, data, env) :
object 'Insurance_Category' not found

I am attempting to code for a decision tree and use the decision tree to analyze two .csv files, which are

PolicyHolders.csv and PolicyBuyers.csv.

I could import both files and generate the tables I needed to see.

I tried this as well:

PolicyHolders<-read.csv("PolicyHolders.csv"," header = TRUE)

  • attach(PolicyHolders)
  • head(PolicyHolders)
  • summary(PolicyHolders)

You are not specifying the data argument and you are not attaching the dataframe either, so R can't find any of the variables you are referencing on the formula argument.

The solution would be to either specify the data argument inside the rpart() function or attaching your dataframe before calling the function.

Thanks.

I started here:

PolicyHolders <-read.csv("PolicyHolders.csv")

head(PolicyHolders)
Account_ID Gender Age Marital_Status Account_Activity
1 9552 M 21 M High
2 6757 M 55 M Moderate
3 3599 F 53 M High
4 6811 M 33 M High
5 4104 M 53 S High
6 7226 M 22 S High
EmployedMoreThan12Mo Claim_12Mo Accident_36Mo
1 Yes No No
2 Yes Yes No
3 Yes No No
4 Yes No No
5 Yes No No
6 No No No
Ticket_12Mo Payment_Method Insurance.Category
1 Yes Bank Transfer Do Not Insure
2 Yes Bank Transfer Insure-Best Terms
3 No Bank Transfer Insure-Risk Terms
4 No Web Payment Insure-Risk Terms
5 Yes Web Payment Do Not Insure
6 No Credit Card Insure-High Premium

Then here:

tail(PolicyHolders)
Account_ID Gender Age Marital_Status
656 4632 F 39 M
657 8450 M 34 S
658 2048 F 50 M
659 9630 F 56 M
660 9982 F 27 S
661 2542 M 25 S
Account_Activity EmployedMoreThan12Mo Claim_12Mo
656 High Yes No
657 Low Yes No
658 Moderate Yes Yes
659 High Yes Yes
660 High Yes Yes
661 Moderate No No
Accident_36Mo Ticket_12Mo Payment_Method
656 No No Web Payment
657 No No Web Payment
658 No Yes Web Payment
659 No No Web Payment
660 No Yes Monthly Billing
661 No No Bank Transfer
Insurance.Category
656 Insure-High Premium
657 Do Not Insure
658 Insure-Risk Terms
659 Insure-High Premium
660 Insure-Risk Terms
661 Insure-High Premium

Here:

shuffle_index <- sample <- sample(1:nrow(PolicyHolders))

head(shuffle_index)
[1] 415 463 179 526 195 118
PolicyHolders <- PolicyHolders)[shuffle_index, ]
Error: unexpected ')' in "PolicyHolders <- PolicyHolders)"
PolicyHolders <- PoilcyHolders[shuffle_index,]
Error: object 'PoilcyHolders' not found
PolicyHolders <- PolicyHolders[shuffle_index,]
head(PolicyHolders)
Account_ID Gender Age Marital_Status
415 1302 M 65 M
463 4532 F 35 M
179 4897 M 48 S
526 9820 M 25 M
195 4662 M 25 S
118 9726 F 41 M
Account_Activity EmployedMoreThan12Mo Claim_12Mo
415 High Yes No
463 High Yes Yes
179 Moderate Yes Yes
526 High Yes No
195 High Yes Yes
118 High Yes No
Accident_36Mo Ticket_12Mo Payment_Method
415 No No Bank Transfer
463 No No Web Payment
179 No No Web Payment
526 No Yes Web Payment
195 No Yes Bank Transfer
118 Yes Yes Bank Transfer
Insurance.Category
415 Insure-Best Terms
463 Insure-High Premium
179 Insure-Best Terms
526 Do Not Insure
195 Do Not Insure
118 Insure-High Premium

And here:

install.packages("dplyr")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/gemtu/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.1/dplyr_1.0.9.zip'
Content type 'application/zip' length 1391520 bytes (1.3 MB)
downloaded 1.3 MB

package ‘dplyr’ successfully unpacked and MD5 sums checked

I want to transition to the part and I do not see any reason to clean my data, but I am not sure how to transitun with code from here.

Updates:

or rebuild the rpart model with model=TRUE.

PolicyTree <-Insurance_Category ~ Gender, data = PolicyHolders)
Error: unexpected ',' in "PolicyTree <-Insurance_Category ~ Gender,"
Policy_Tree <-Insurance_Category ~ Gender, data = PolicyHolders, method = 'class'
Error: unexpected ',' in "Policy_Tree <-Insurance_Category ~ Gender,"

You are showing a lot of syntax errors, either missing parentheses, arguments passed outside parentheses, missing arguments, etc.

My advice, check the function's documentation so you know which arguments you are expected to provide, and more carefully construct the function call.

As I said to you before, you would get much better help if you take the time to provide a proper REPRoducible EXample (reprex), as explained on this link.

Thanks. I will keep editing my arguments and syntax; I am still learning the syntax. As far as reprex; I cannot do that--it is too complex and time-consuming, but I will try and shorten my post code excerpts. Where can I learn better syntax?

I have found many of the websites and textbooks only mildly useful.

I am using these references to fix my syntax:

R - Decision Tree.

The guy with the glasses was most helpful.

I am thinking in terms of this syntax:

library(rpart)

set.seed(120)

Index <- sample(2, nrow(data), replace= TRUE, prob=c(0.7,0.3))

train <- data[index==1,]

test <- data[index==2,]

My_formula = Target ~.

My_Tree <- rpart(My_formula, data = train)

My_Tree <- rpart(My_formula, newdata = test)

I solved it. Thanks.

I solved my problem. My RStudio is buggy even after uninstall and reinstall so rpart.plot and PolicyTree cannot be installed. I also had to change the syntax from PolicyTree <- to Model <- rpart.

This is very unlikely, RStudio is just an IDE for the R programming language, it doesn't has direct influence on your R package library.

What you are mentioning has nothing to do with syntax but simply with how you name things which is only a matter of personal preference.

We disagree--it really did occur--you are an expert in R, and I am intermediate, but I proved my claims to my professor and a team of R programmers.

That's OK, my comment is actually intended for others reading this thread, so they do not get misleaded by an uninformed conclusion.

I do not deny you have experienced problems but they are almost certainly not related to the RStudio IDE being buggy as you think.

They are not being misled. I worked with java for years for example, and taught it at a University. I worked with all of the IDEs for Java as well as Python too. I found in my usage, teaching, and research, that IDEs can affect how packages get imported, and how syntax reading is affected. Do not be misled here; IDEs and computer runtimes based thereof can interfere with uploading necessary packages and conducting analyses.

Yes, the RStudio IDE can on rare occasions affect how R packages get loaded but can not alter how their functions are defined therfore can not affect what syntax is expected to be used with them, and at least what you have shown here (syntax errors and typos), has nothing to do with bugs related to the IDE.

Maybe you have indeed encounter an IDE bug but there is no evidence of that in here. If you can manage to consistently reproduce an issue indubitably attributable to the IDE I encourage you to formally report it by filing an issue report on the GitHub repository for the RStudio IDE

Thanks. I did not show the evidence here. I showed it to my professor and a team of professional R coders.