How can i token unblanked sentence entry in textmining?

oktayozden · December 9, 2019, 9:16am

Hello,
I want to cluster customer text data. For example, basic text is like:
text1:
... VARDIR.SAHİLE BAKAN KISIMDA.İLÇE BELEDİYE YE AİT....

After the part of token which is below, some words were not tokened because some people did not space after . or unblanked sentence entry just like "KISIMDA.İLÇE"
Algorithm do not token like "KISIMDA", "İLÇE". It takes bot of them like "KISIMDA.İLÇE"

How can i solve this problem?
thanks a lot

token part
train.tokens<-tokens(text2$Text, what="word",
remove_numbers=TRUE, remove_symbol=TRUE, remove_separators=TRUE,
remove_punct= TRUE, remove_hyphens=TRUE)
train.tokens<-tokens_tolower(train.tokens)
train.tokens1<-tokens_select(train.tokens, stopwords("tr", source = "stopwords-iso"),
selection = "remove")

system · December 30, 2019, 9:16am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.