Hi,
I'm new to R and I've gone through some examples of Decision Trees taken from existing, build in data.
Now I would like to apply that to my data file taken from our SQL database.
My starting point is preparing a file:
library(RODBC)
abc <- odbcConnect("sqldatabase")
my.data <- sqlQuery(abc, "SELECT * FROM sqldatatable")
Belgium.data <- my.data[my.data$CountryID == 15, ]
Belgium.CurrentHY.data <- subset(Belgium.data, InterviewDate > "2018-04-01" & InterviewDate < "2018-09-30")
I can display it without any problems:
str(Belgium.CurrentHY.data)
$ CountryID : int 15 15 15 15 15 15 15 15 15 15 ... $ InterviewDate : POSIXct, format: "2018-04-25 08:12:00" "2018-04-26 13:05:00" "2018-04-04 17:28:00" "2018-04-10 12:12:00" ...
$ A2 : int 9 10 10 8 10 9 10 10 9 10 ...
$ B1 : int 10 10 8 7 10 8 9 10 8 10 ...
$ C1 : int 10 10 9 8 10 9 10 9 9 9 ...
so Belgium.CurrentHY.data exists
Now, I would like to:
- Recode variable A2 (values from 1 to 10) into A2TB (values 9-10 as 1 and values 1-8 as 2)
- Use this recoded A2TB variable as decision tree target (to see proportions 1 to 2) with other variables such as B1 and C1
How can I do that? Do I need to created a data frame first? Examples I have gone through are based on following commands:
Example 1:
library("party")
str(iris)
iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris)
print(iris_ctree)
plot(iris_ctree)
Example 2
library(ISLR)
data(package="ISLR")
carseats<-Carseats
require(tree)
names(carseats)
hist(carseats$Sales)
High = ifelse(carseats$Sales<=8, "No", "Yes")
carseats = data.frame(carseats, High)
tree.carseats = tree(High~.-Sales, data=carseats)
summary(tree.carseats)
plot(tree.carseats)
text(tree.carseats, pretty = 0)
Now, I would like the above (or different commands) to create my own decision trees based on Belgium.CurrentHY.data.
Can you help?
Slavek