I successfully merged two data sets and am now trying to remove all rows with an 'NA' value for the 'Provider_name' column in R. I have tried the below commands but neither of them are deleting the rows, at all, when I write.table to csv file. R does not return any error messages. What am I missing?
OK, fixed that dumb mistake. But now I am able to see that rather than only deleting the observations for "NA" in column, "Provider_Name", it's deleting all rows with NA. Any ideas on where I'm going wrong there?
(ex_df <- data.frame(
x = c(1,NA),
y= c(NA,2)
))
ex_df[!is.na(ex_df$x),]
Can you demonstrate /prove that you are correct in your assessment that other NA's which are not associated with the provider name column NAs are also lost ?
The original data set contains ~65,000 observations, merged (to include one new variable) from another data set results in ~72,000 observations. "NA" appears in various variables for different reasons and a clean set of observations would result in a return to roughly ~65,000 lines after remove all observations with "NA" in Provider_Name. The result from executing the command we're discussing is ~20,500 observations instead of 65,000 which... when I manually cleaned in excel the other day, was not the case. I was able to get it to ~65,000.
my recommendation is to go quantative and try counting the NA's in the relevant files; Perhaps your join went awry ?
if you are correct and you have way more NA in your datasets than you should, you will need to backtrack your steps and find how you are introducing them; however the shoe could be on the other foot, with the expectation you formed in excel not being born out. I can hardly comment, I didnt see the excel, and I haven't seen your data in R , nor the code you've used aside from the !is.na() stuff.
Might it have anything to do with the way the variables are "formatted" - I have Provider_Name as a "character" variable (which makes the most sense). Would that have any impact?