I have computed a z transform of x1 which is saved into datafame df3. I have done this for variables x1:x11.
Can I subset dataframe zf based upon df3? It appears to work although I have not seen this done anywhere.
Variable x1 is processed as follows:
zstat <- qnorm(.975, mean = 0, sd = 1, lower.tail = TRUE)
cat("zstat=",zstat,"\n","\n")
data_outliers <- subset(zf, abs(df3$x1) > zstat) <----subset is done here
numOutliers <- dim(data_outliers)[1]
cat("Number of outliers is ", numOutliers, "\n")
cat("outlier ids are","\n")
print(data_outliers$id)
data_nooutliers <- subset(zf, abs(df3$x1) < zstat)
numNooutliers <- dim(data_nooutliers)[1]
cat(" ","\n")
cat("Number of nooutliers is ", numNooutliers, "\n")
cat("nooutlier ids are","\n")
head(data_nooutliers$id,25)
cat("\n","numOutliers+numNooutliers=",numOutliers+numNooutliers,"\n")
I then developed a for loop for all 11 variables. For some reason the nooutliers dataset is not output. Can anyone say why? Since the size of this dataset can be very large I am using the head statement rather than the print statement. The code is very much like that for individual variables x1,x2,...x11.
zstat <- qnorm(.975, mean = 0, sd = 1, lower.tail = TRUE)
cat("zstat=",zstat,"\n","\n")
for(i in 1:11) {
v <- paste("x",i,sep="")
print(v)
data_outliers <- subset(zf, abs(df3[[v]]) > zstat)
numOutliers <- dim(data_outliers)[1]
cat("Number of outliers is ", numOutliers, "\n")
cat("outlier ids are","\n")
print(data_outliers$id)
data_nooutliers <- subset(zf, abs(df3[[v]])< zstat)
numNooutliers <- dim(data_nooutliers)[1]
cat(" ","\n")
cat("Number of nooutliers is ", numNooutliers, "\n")
cat("nooutlier ids are","\n")
head(data_nooutliers$id,25)
cat("\n","numOutliers+numNooutliers=",numOutliers+numNooutliers,"\n","\n")
}