Hi,
I am using glm to predict a percentage. My code works up until I am trying to predict using my test data. My predictions return some percentages, but most of it is NA. I have cleaned my data so I do not know what could be causing it.
Here is the warning message I get as well as the code I am using.
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
library(ggplot2)
library(dplyr)
library(caret)
train=read.csv("C:\\Users\\jbhoo\\Downloads\\Training2.csv")
attach(train)
head(train)
#taking care of missing values##
sapply(train,function(x) sum(is.na(x))/length(playbyplayorder_id))
##Less than 50% replace value with mean, over 50% get rid of attribute##
position2[is.na(position2)]<-0
train.clean=na.omit(train)
###Find significant features to predicting
model=glm(reboffensive~.,data = train.clean,family = poisson(link = "log"))
summary(model)
p.off=predict(model,newdata=train.clean,type="response")
###test data
test=read.csv("C:\\Users\\jb\\Downloads\\Testing2.csv")
##Clean data from NA
position2[is.na(position2)]<-0
##Make Predictions
test.clean=test[1:124619,]
p1=predict(model,newdata=test.clean,type="response")