R Keras tries to convert character strings to double

mike4343 · August 16, 2019, 4:32pm

library(keras)
library(dplyr)
data = read.csv("data.csv") 
# str(data) # handy for array details

trainx = data[,1:14] %>% 
mutate_if(is.factor, as.character) %>%
mutate_if(is.integer, as.double) 

trainx[,2] = gsub(" ", "", trainx[,2])
trainx[1:5,]


 
 
age
<dbl>
workclass
<chr>
fnlwgt
<dbl>
education
<chr>
education.num
<dbl>
1	39	State-gov	77516	Bachelors	13	
2	50	Self-emp-not-inc	83311	Bachelors	13	
3	38	Private	215646	HS-grad	9	
4	53	Private	234721	11th	7	
5	28	Private	338409	Bachelors	13	

What the code fragments above are supposed to illustrate is a strange problem.  I have been working with R and RStudio for a few years now, and with Keras as well.  The input looks loke this:

'data.frame':	32561 obs. of  15 variables:
 $ age           : int  39 50 38 53 28 37 49 52 31 42 ...
 $ workclass     : Factor w/ 9 levels " ?"," Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
 $ fnlwgt        : int  77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
 $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
 $ education.num : int  13 13 9 7 13 14 5 9 14 13 ...
 $ marital.status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
 $ occupation    : Factor w/ 15 levels " ?"," Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ...
 $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
 $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
 $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
 $ capital.gain  : int  2174 0 0 0 0 0 0 0 14084 5178 ...
 $ capital.loss  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ hrs.per.week  : int  40 13 40 40 40 40 16 45 50 40 ...
 $ native.country: Factor w/ 42 levels " ?"," Cambodia",..: 40 40 40 40 6 40 24 40 40 40 ...
 $ ge50K         : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...

The problem is with the second column 'workclass' which was initially classified as 'factor' but changed to 'char' to cope with some earlier difficulties.  Likewise the 'int' columns were changted to 'double' as shown above.  Also please note that the character column had a leading space in each data item.  I thought this was the problem and removed the leading space, as you can see, with gsub.

Nevertheless, the error persists when I try to 'fit' the data with 


```{r}
history=fit(model, as.matrix(trainx), trainy, verbose = 0, validation_split = 0.2,
epochs = 30, metrics = "accuracy")

The error I keep getting is

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: could not convert string to float: 'Private'

Why does R try to convert the character string to a float?  Do appreciate the help.

AJF · August 18, 2019, 3:31pm

While I am not 100% sure, here is my guess:

Because your trainx includes columns that are characters (education and workclass), turning into a matrix (i.e. as.matrix(trainx)) turns the entire thing in to a character matrix -- since in a matrix, all elements must be of the same type, and its "easier" for R to turn numbers into a character (e.g 2 --> "2" than a character into a number ("Private" --> ????)

However, keras requires a numerical matrix. Therefore, the program is now trying to turn the entire character matrix into a numerical array, and does not know how.

system · September 8, 2019, 3:31pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.