Unable to delete missing data

Hi everyone! I am trying to remove the missing values from a dataset loaded from SPSS with the package "foreign". The dataset appears as a list. I use the na.omit and na.exclude functions but they don't work. When I view the dataset I still see NAs, and the next command gives an error related to NA, so I suppose the missing values were not removed. Converting it to a data frame didn't help. Thank you!

Here is the code:
as.data.frame(COMET)
COMETgood<-na.omit(COMET)
COMETgood<-na.exclude(COMET)

set.seed(123)
training.samples<-COMETgood$Viol_dur %>%
createDataPartition(p=0.8,list=FALSE)

And I get this error for the last line:
Error in quantile.default(y, probs = seq(0, 1, length = groups)) :
missing values and NaN's not allowed if 'na.rm' is FALSE

Hi @ChristinaPalantza,
You did not assign your new as.data.frame(COMET) to an object. See this Reproducible Example:

# Make some inbuilt data into a list. I assume your SPSS data looks like this?
COMET <- as.list(mtcars)

# Add some dummy NA and NaN values
COMET$mpg[c(2,5,7,9)] <- NA
COMET$wt[c(12,15,22,23)] <- NaN

# Return list to dataframe, and omit all rows with NA or NaN
COMETgood <- as.data.frame(COMET)
COMETsmall <- na.omit(COMETgood)
head(COMETsmall, n=10)
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

suppressPackageStartupMessages(library(caret))
suppressPackageStartupMessages(library(dplyr))
set.seed(123)
training.samples <- COMETsmall$mpg %>%
  createDataPartition(p=0.5, times=3, list=FALSE)

training.samples
#>       Resample1 Resample2 Resample3
#>  [1,]         1         2         3
#>  [2,]         2         3         4
#>  [3,]         4         4         5
#>  [4,]         5         9         7
#>  [5,]         6        11        10
#>  [6,]         7        12        11
#>  [7,]        10        13        13
#>  [8,]        11        16        14
#>  [9,]        14        17        15
#> [10,]        19        20        16
#> [11,]        20        22        17
#> [12,]        23        24        19

Created on 2021-05-28 by the reprex package (v2.0.0)

1 Like

Sorry for taking me so long to reply! Thank you very much for the detailed answer!! I do not understand though why I have to add dummy variables with NA and NaN values. Yes, my SPSS data is a list

You don't have to add them, but if davoWW didn't... then he couldn't demonstrate how to get rid of them as he did.

This sort of thing motivates why you would do well to create your own reprex for stipulating the context you want help with.

1 Like

Thank you very for clarifying! Now I managed it and it works!

Ok, now it works, thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.