Keras Tokenizer is empty

ByFu · December 21, 2020, 11:34pm

Hello guys , I'm new at developing neural networks and in R, so I'm trying to make a tokenizer in order to make sentences into integers. I have a dataset which consists in a matrix of 2 columns Spanish-English sentences with 20k sentences each. So when I try to make the tokenizer and then use $word_index no vocabulary has been add. Any help? Btw, the dataset has sentences , it isnt empty. Here is my code:

en_tokenizer <- text_tokenizer()  %>% fit_text_tokenizer(dataset[,2])
es_tokenizer <- text_tokenizer() %>%fit_text_tokenizer(dataset[,1])
es_vocab_size <- length(es_tokenizer$word_index)
en_vocab_size <- length(en_tokenizer$word_index)

es_maxlen <- get_longest(dataset[,1])
en_maxlen <- get_longest(dataset[,2])


x_train <- encode_seq(es_tokenizer,es_maxlen,train_dataset[,1])
y_train <- encode_seq(en_tokenizer,en_maxlen,train_dataset[,2])

x_test <- encode_seq(es_tokenizer, es_maxlen, test_dataset[,1])
y_test <- encode_seq(en_tokenizer, en_maxlen, test_dataset[,2])

system · February 14, 2021, 3:34am

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.