Problem with as.numeric


#1

I looked it up online of how to deal with as.numeric on my wt (which I used read.csv on). The result is as follows:
wt
[1] 152,821 155,707 159,443 160,053 163,741 164,760 164,131 167,405
[9] 168,672 171,287 172,307 175,223 178,692 179,006 182,528 183,740
[17] 185,733 190,345 193,160 197,016 198,741 199,156 203,282 203,160
[25] 204,713 208,934 208,549 212,262 214,493 215,055 219,398 222,575
[33] 227,677 226,194 228,596 231,269 228,231 229,957 234,569 236,630
[41] 236,042 244,430 249,943 255,117 256,049 262,624 264,564 270,634
...

However, after applying as.numeric(): I will get
wt
[1] 2 3 4 5 6 8 7 9 10 11 12 13 14 15 16 17 18
[18] 19 20 21 22 23 25 24 26 28 27 29 30 31 32 33 35 34
[35] 37 39 36 38 40 42 41 43 44 45 46 47 48 49 50 51 52
[52] 53 55 56 57 54 58 59 60 62 61 63 64 67 66 65 68 69
[69] 70 71 72 73 74 75 76 78 81 79 85 89 91 90 86 87 83
....

If I use as.numeric(as.character(wt)) or the one with levels, then I will get NAs by coercion. Is there any other way to solve this? Thanks!


#2

I don't know what your original data file looks like, but here is what I think is happening, based on the information you provided: It looks like your original data has comma separators in it. If so, R is treating the data as strings, rather than numbers. Also, because read.csv converts strings to factors by default, wt is a vector of class factor. When you run as.numeric R is converting wt to the underlying factor labels, which are the integer values you reported in your question.

To convert wt to numeric, try the following:

  1. When you read the data, include stringsAsFactors=FALSE as an argument in read.csv. This will prevent R from converting wt to factor class.

  2. Now that you've read in the data, remove the commas:

     wt = gsub(",", "", wt)
    
  3. Convert the data to numeric:

     wt = as.numeric(wt)
    

To see what happens if you don't deal with factors or commas appropriately, run the code below in your console.

x = factor("395,324")
as.numeric(as.character(x))
x = gsub(",", "", x)
as.numeric(as.character(x))

x = "395,324"
as.numeric(x)
x = gsub(",", "", x)
as.numeric(x)

You need to run as.numeric(as.character(x)) with factors. If you don't convert from factor to character first, you'll get the underlying numeric factor codes instead of the actual numeric values of the variable. You need to remove the commas, otherwise R will assume that the values are strings, rather than numbers.


#3

Thanks for the reply! I tried the first code (the one with string) too.
My original attempt was this: (some of my friends didn't need to use the stringsAsFactors at all!; I wonder R for mine is behaving weirdly)
CPE.data <- read.csv("CONS_Canada.csv", stringsAsFactors = FALSE)

CPE.data
X Seasonally.Unadjusted Seasonally.Adjusted
1 1961:01 36,070 152,821
2 1961:02 39,770 155,707
3 1961:03 38,290 159,443
4 1961:04 42,876 160,053

Then,

wt <- as.numeric(CPE.data$Seasonally.Adjusted)
Warning message:
NAs introduced by coercion


#4

Reading using read.csv() with , in your number will lead to interpretation as string and hence stringsAsFactors is activated. Therefore, your numbers will be read in as a factor and when you convert that to numeric, you get the levels of the factor levels


#5

Hello Leon!
Okay, I see the commas in the csv file now. Is there an efficient way to remove all of them? I converted it from an excel file, but I originally thought that it would be removed itself.


#6

As usual tidyverse is your friend:

library('tidyverse')
"152,821" %>% factor %>% str_replace(',', '') %>% as.numeric

Check out readr and read your file using read_csv()


#7

Thanks Leon, it's my first time on here! This community is helpful!


#8

Also, take a look at the package readxl


#9

Here it is another way to convert string in accounting format to numeric.

> "152,821" %>% formattable::accounting()
[1] 152,821
> "152,821" %>% formattable::accounting() %>% typeof()
[1] "double"
> "152,821" %>% formattable::accounting() %>% as.numeric()
[1] 152821

So you don't need worry about the the comma and digits, and maintain your data in the accounting format to calculate.