is it ok to use a different dataframe as a test condition when subsetting another dataframe?

I have computed a z transform of x1 which is saved into datafame df3. I have done this for variables x1:x11.
Can I subset dataframe zf based upon df3? It appears to work although I have not seen this done anywhere.
Variable x1 is processed as follows:

zstat <- qnorm(.975, mean = 0, sd = 1, lower.tail = TRUE)
cat("zstat=",zstat,"\n","\n")

data_outliers <- subset(zf, abs(df3$x1) > zstat)  <----subset is done here
numOutliers <- dim(data_outliers)[1] 
cat("Number of outliers is ", numOutliers, "\n")
cat("outlier ids are","\n")
print(data_outliers$id)

data_nooutliers <- subset(zf, abs(df3$x1) < zstat)
numNooutliers <- dim(data_nooutliers)[1] 
cat(" ","\n")
cat("Number of nooutliers is ", numNooutliers, "\n")
cat("nooutlier ids are","\n")
head(data_nooutliers$id,25)
cat("\n","numOutliers+numNooutliers=",numOutliers+numNooutliers,"\n")

I then developed a for loop for all 11 variables. For some reason the nooutliers dataset is not output. Can anyone say why? Since the size of this dataset can be very large I am using the head statement rather than the print statement. The code is very much like that for individual variables x1,x2,...x11.

zstat <- qnorm(.975, mean = 0, sd = 1, lower.tail = TRUE)
cat("zstat=",zstat,"\n","\n")

for(i in 1:11) {
v <- paste("x",i,sep="")  
print(v)
data_outliers <- subset(zf, abs(df3[[v]]) > zstat)

numOutliers <- dim(data_outliers)[1] 
cat("Number of outliers is ", numOutliers, "\n")
cat("outlier ids are","\n")
print(data_outliers$id)

data_nooutliers <- subset(zf, abs(df3[[v]])< zstat)
numNooutliers <- dim(data_nooutliers)[1] 
cat(" ","\n")
cat("Number of nooutliers is ", numNooutliers, "\n")
cat("nooutlier ids are","\n")
head(data_nooutliers$id,25)
cat("\n","numOutliers+numNooutliers=",numOutliers+numNooutliers,"\n","\n")
}

I can't think of a reason that subsetting with a separate data frame will not work but I certainly don't know the details of how the subset() function works. I expect that most people keep data that are related in the same data frame. If you can use df3 to subset() zf, you could also cbind() the two data frames, after ensuring unique column names, and subset that. Personally, I would want to inner_join them to guarantee that rows are properly aligned.

If head(data_nooutliers$id,25) is not giving you the output you expect, try print(head(data_nooutliers$id,25)).

This is my opinion
I would say its possible, but extremely fraught and easy for a programmer to make a mistake , and to not realise it. Therefore I would almost never consider doing it.

It makes a lot more sense to concretely join your data, and to operate filters on the joined data.
I.e. verify that your tables joined as you need them to for your logic to make sense.
Everything being explicit and investigable in steps is a huge win for the programmer.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.