MLP model with Keras and text2vec package

KafeelI · January 2, 2019, 8:47am

Hello

I have built a MLP model using keras to classify text. In order to predict the class of a given text, again I did below steps which I did for "training" set in the model building process

q is some random question in text format
ls is a defined function to convert text into lower case and do stemming

token=itoken(q,preprocess_function=ls,tokenizer=word_tokenizer)
vtxt=create_vocabulary(token,stopwords=stemDocument(stopwords('english')),ngram=c(1,1))
vectorizer=vocab_vectorizer(vtxt)
# #Document term matrix
dtmq=create_dtm(token,vectorizer)

To get TF-IDF, which is parsed in MLP, I have used below steps

model1=TfIdf$new(smooth_idf = TRUE,norm="l2")
dtm=model1$fit_transform(dtm)

Now the dimension of 'dtmq' is (1,35) but my model requires input_shape of 1462, how can I convert the given text in such a way that my MLP model accepts.

KafeelI · January 3, 2019, 7:09am

By using pruned vectorizer used for training part, we will be able to get same dimension.

#Preparation
tokenq=itoken(q,preprocess_function=ls,tokenizer=word_tokenizer)
vectorizer=vocab_vectorizer(pruned_vocab)

#Document term matrix
dtmq=create_dtm(tokenq,vectorizer)
dim(dtmq)

jcblum · January 4, 2019, 9:57pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · January 11, 2019, 9:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.