library(keras)
library(dplyr)
data = read.csv("data.csv")
# str(data) # handy for array details
trainx = data[,1:14] %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.integer, as.double)
trainx[,2] = gsub(" ", "", trainx[,2])
trainx[1:5,]
age
<dbl>
workclass
<chr>
fnlwgt
<dbl>
education
<chr>
education.num
<dbl>
1 39 State-gov 77516 Bachelors 13
2 50 Self-emp-not-inc 83311 Bachelors 13
3 38 Private 215646 HS-grad 9
4 53 Private 234721 11th 7
5 28 Private 338409 Bachelors 13
What the code fragments above are supposed to illustrate is a strange problem. I have been working with R and RStudio for a few years now, and with Keras as well. The input looks loke this:
'data.frame': 32561 obs. of 15 variables:
$ age : int 39 50 38 53 28 37 49 52 31 42 ...
$ workclass : Factor w/ 9 levels " ?"," Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ education.num : int 13 13 9 7 13 14 5 9 14 13 ...
$ marital.status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 15 levels " ?"," Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ...
$ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
$ capital.gain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capital.loss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hrs.per.week : int 40 13 40 40 40 40 16 45 50 40 ...
$ native.country: Factor w/ 42 levels " ?"," Cambodia",..: 40 40 40 40 6 40 24 40 40 40 ...
$ ge50K : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
The problem is with the second column 'workclass' which was initially classified as 'factor' but changed to 'char' to cope with some earlier difficulties. Likewise the 'int' columns were changted to 'double' as shown above. Also please note that the character column had a leading space in each data item. I thought this was the problem and removed the leading space, as you can see, with gsub.
Nevertheless, the error persists when I try to 'fit' the data with
```{r}
history=fit(model, as.matrix(trainx), trainy, verbose = 0, validation_split = 0.2,
epochs = 30, metrics = "accuracy")
The error I keep getting is
Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: could not convert string to float: 'Private'
Why does R try to convert the character string to a float? Do appreciate the help.
While I am not 100% sure, here is my guess:
Because your trainx
includes columns that are characters (education
and workclass
), turning into a matrix (i.e. as.matrix(trainx)
) turns the entire thing in to a character matrix -- since in a matrix, all elements must be of the same type, and its "easier" for R to turn numbers into a character (e.g 2
--> "2"
than a character into a number ("Private"
--> ????)
However, keras requires a numerical matrix. Therefore, the program is now trying to turn the entire character matrix into a numerical array, and does not know how.
1 Like
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.