Predictions are filled with NA, why do I not have a full set of predicitons.

Hi,

I am using glm to predict a percentage. My code works up until I am trying to predict using my test data. My predictions return some percentages, but most of it is NA. I have cleaned my data so I do not know what could be causing it.

Here is the warning message I get as well as the code I am using.

Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

library(ggplot2)
library(dplyr)
library(caret)
train=read.csv("C:\\Users\\jbhoo\\Downloads\\Training2.csv")
attach(train)
head(train)

#taking care of missing values##  
sapply(train,function(x) sum(is.na(x))/length(playbyplayorder_id))
##Less than 50% replace value with mean, over 50% get rid of attribute##
position2[is.na(position2)]<-0
train.clean=na.omit(train)

###Find significant features to predicting 
model=glm(reboffensive~.,data = train.clean,family = poisson(link = "log")) 
summary(model)
p.off=predict(model,newdata=train.clean,type="response")
###test data
test=read.csv("C:\\Users\\jb\\Downloads\\Testing2.csv")
##Clean data from NA
position2[is.na(position2)]<-0
##Make Predictions
test.clean=test[1:124619,]
p1=predict(model,newdata=test.clean,type="response")

Hello,

I really suggest that you create a reprex (FAQ: How to do a minimal reproducible example ( reprex ) for beginners) in order to replicate this behaviour. This will just make it much easier for others to help you.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.