Converting data from factor to quantitative variables

I am trying to convert this data from factor to quantitative variables, and remove any NA value from the data. But, I am getting the below error message. Any idea what i am doing wrong:

CancerData=data.frame(stringsAsFactors=FALSE,
                Id = c("1000025", "1002945", "1015425", "1016277", "1017023"),
      Cl.thickness = c("5", "5", "3", "6", "4"),
         Cell.size = c("1", "4", "1", "8", "1"),
        Cell.shape = c("1", "4", "1", "8", "1"),
     Marg.adhesion = c("1", "5", "1", "NA", "3"),
      Epith.c.size = c("2", "7", "2", "3", "2"),
       Bare.nuclei = as.factor(c("1", "10", "2", "4", "1")),
       Bl.cromatin = as.factor(c("3", "3", "3", "3", "3")),
   Normal.nucleoli = as.factor(c("1", "2", "1", "7", "1")),
           Mitoses = as.factor(c("1", "NA", "1", "1", "1")),
             Class = as.factor(c("benign", "benign", "benign", "benign",
                                 "benign"))
)


is.na(CancerData)
#>         Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
#> [1,] FALSE        FALSE     FALSE      FALSE         FALSE        FALSE
#> [2,] FALSE        FALSE     FALSE      FALSE         FALSE        FALSE
#> [3,] FALSE        FALSE     FALSE      FALSE         FALSE        FALSE
#> [4,] FALSE        FALSE     FALSE      FALSE         FALSE        FALSE
#> [5,] FALSE        FALSE     FALSE      FALSE         FALSE        FALSE
#>      Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses Class
#> [1,]       FALSE       FALSE           FALSE   FALSE FALSE
#> [2,]       FALSE       FALSE           FALSE   FALSE FALSE
#> [3,]       FALSE       FALSE           FALSE   FALSE FALSE
#> [4,]       FALSE       FALSE           FALSE   FALSE FALSE
#> [5,]       FALSE       FALSE           FALSE   FALSE FALSE

as.numeric(as.character(CancerData))
#> Warning: NAs introduced by coercion
#>  [1] NA NA NA NA NA NA NA NA NA NA NA

This will convert factors (an characters) to numbers and remove any row with NAs in it.

library(tidyverse)

CancerData <- data.frame(stringsAsFactors=FALSE,
                         Id = c("1000025", "1002945", "1015425", "1016277", "1017023"),
                         Cl.thickness = c("5", "5", "3", "6", "4"),
                         Cell.size = c("1", "4", "1", "8", "1"),
                         Cell.shape = c("1", "4", "1", "8", "1"),
                         Marg.adhesion = c("1", "5", "1", "NA", "3"),
                         Epith.c.size = c("2", "7", "2", "3", "2"),
                         Bare.nuclei = as.factor(c("1", "10", "2", "4", "1")),
                         Bl.cromatin = as.factor(c("3", "3", "3", "3", "3")),
                         Normal.nucleoli = as.factor(c("1", "2", "1", "7", "1")),
                         Mitoses = as.factor(c("1", "NA", "1", "1", "1")),
                         Class = as.factor(c("benign", "benign", "benign", "benign",
                                             "benign"))
)

CancerData %>% 
    mutate_at(vars(-Class), ~parse_number(as.character(.))) %>% 
    mutate(Class = as.numeric(Class)) %>% 
    drop_na()
#>        Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
#> 1 1000025            5         1          1             1            2
#> 3 1015425            3         1          1             1            2
#> 5 1017023            4         1          1             3            2
#>   Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses Class
#> 1           1           3               1       1     1
#> 3           2           3               1       1     1
#> 5           1           3               1       1     1

Created on 2019-11-20 by the reprex package (v0.3.0.9000)

3 Likes

Many Thanks for this! Is it possible to exclude the last column from being converted to a numeric?

1 Like

Yes, just delete the line that converts Class to numeric

CancerData %>% 
    mutate_at(vars(-Class), ~parse_number(as.character(.))) %>% 
    drop_na()
2 Likes

Ah..okay Thanks a lot.

Is this the only method the conversion can be done?

There are other ways but what is your issue with this one? it would be helpful to know the constraints in advance.

1 Like

This method works okay for me. Just wondering if it can be addressed in a different way. Thanks

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.