For discussions related to modeling, machine learning and deep learning. Related packages include RevoScaleR
.
Hi all. I'm looking for an advise/help.
I've created ML solution using rxDForest. I use RStudio as a dev environment and sys.sp_execute_external_script on SQL Server to productionize it. Both, RStudio and SQL Server point into R instance on the server - use R Server.
Although I run the same code on these two environments, I get different results.
I've also created solution using AdventureWorksDW2017 - also have different results on RStudio and SSMS. Moreover, my mate run the same code on his laptop, using SSMS and R Client, and have the same results as my on SSMS and R Server.
Can anybody tell why?
use AdventureWorksDW2014
go
DROP PROC IF EXISTS rxDForest_demo
GO
CREATE PROC rxDForest_demo
AS
BEGIN
EXECUTE sys.sp_execute_external_script
@language = N'R'
, @script = N'
set.seed(2018);
splitList <- rxSplit(
inData = train,
splitByFactor = "splitLabel",
transforms = list(splitLabel =
factor(sample(0:1,
size = .rxNumRows,
replace = TRUE,
prob = c(0.7, 0.3)),
levels = 0:1,
labels = c("Train", "Test"))),
overwrite = TRUE
);
# split data to training and test set
trainDF <- rxDataStep(inData = splitList[[2]], varsToDrop = c("splitLabel"));
testDF <- rxDataStep(inData = splitList[[1]], varsToDrop = c("splitLabel"));
buyer_train_Forest <- rxDForest(BikeBuyer ~
NumberCarsOwned
+CommuteDistance
+Gender
+NumberChildrenAtHome
+MaritalStatus
+YearlyIncome,
seed = 10,
data = trainDF,
cp=0.01,
nTree=100, #500
mTry=5);
predictBuyer <- rxPredict(modelObject = buyer_train_Forest,
data = testDF,
#data = cdrTestSQL,
type = "prob",
overwrite = TRUE);
buyer_Y_N <- 0.5;
predictBuyer$pred_Y_N <- ifelse(predictBuyer$BikeBuyer_Pred > buyer_Y_N, "Y", "N");
predictDF <- cbind(testDF[,8], predictBuyer)
trained_model <- data.frame(predictDF); '
, @input_data_1 = N'select top 2000
a.BikeBuyer
, a.NumberCarsOwned
, a.CommuteDistance
, a.Gender
, a.NumberChildrenAtHome
, a.MaritalStatus
, a.YearlyIncome
, a.CustomerKey
from vTargetMail a
order by a.CustomerKey;'
, @input_data_1_name = N'train'
, @output_data_1_name = N'trained_model'
WITH RESULT SETS (([CustomerKey] int
, [Probability] decimal(18,6)
, [Buyer] char(1)
))
END;