Hello guys , I'm new at developing neural networks and in R, so I'm trying to make a tokenizer in order to make sentences into integers. I have a dataset which consists in a matrix of 2 columns Spanish-English sentences with 20k sentences each. So when I try to make the tokenizer and then use $word_index no vocabulary has been add. Any help? Btw, the dataset has sentences , it isnt empty. Here is my code:
en_tokenizer <- text_tokenizer() %>% fit_text_tokenizer(dataset[,2])
es_tokenizer <- text_tokenizer() %>%fit_text_tokenizer(dataset[,1])
es_vocab_size <- length(es_tokenizer$word_index)
en_vocab_size <- length(en_tokenizer$word_index)
es_maxlen <- get_longest(dataset[,1])
en_maxlen <- get_longest(dataset[,2])
x_train <- encode_seq(es_tokenizer,es_maxlen,train_dataset[,1])
y_train <- encode_seq(en_tokenizer,en_maxlen,train_dataset[,2])
x_test <- encode_seq(es_tokenizer, es_maxlen, test_dataset[,1])
y_test <- encode_seq(en_tokenizer, en_maxlen, test_dataset[,2])