I needed to merge different datasets in R. During this data has been lost. I need to know where / when / in which step(s) this happened, so I need to count the number of values for a specific column (in this case, Calcium and Ketosis), over the different merged / new datasets.
I now am using length for calcium and that works, but lit up a complete new error too, that's for another post.
When I try using
length(which(DfKetosisUterusRaw$Ketosis > 0 ))
for the DfKetosisUterusRaw$Ketosis I get the following:
When I then try to change the $ketosis into numeric i get the following
So at this point, I am stuck. Apparently Ketosis is seen as a factor and I cannot seem to get the column as numeric. Maybe some of you have an idea what I am doing wrong?
Any explanation would be very much appreciated. Thanks in advance.
Here is a way of testing the different columns for missing or 0 values
#Dummy data
myData = data.frame(x = 1:10,
y = c(1,5,0.2,0,0,8,NA,5,NA,6.3),
z = c(1,NA,0.2,7,0,8,4.3,5,0,6.3))
#Number of values in columns that are missing or 0
nMissing = apply(myData, 2, function(x){
sum(is.na(x) | x == 0)
})
nMissing
#> x y z
#> 0 4 3
#Number of 'correct' values in each column
nrow(myData) - nMissing
#> x y z
#> 10 6 7
The apply function runs a function over each column (option 2 as second argument), in this case a check of NA or 0
Since I could not work with your data or code, I came up with a dummy example. Next time consider creating a reprex. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:
Thank you for your answer. I will go and test this right away for the ketosis dataframe! I will try to make a reprex. I have looked into creating reprex before and to be honest, for me it isn't that simple as it seems... but I ll give it some more hours.