Hello,
I want to cluster customer text data. For example, basic text is like:
text1:
... VARDIR.SAHİLE BAKAN KISIMDA.İLÇE BELEDİYE YE AİT....
After the part of token which is below, some words were not tokened because some people did not space after . or unblanked sentence entry just like "KISIMDA.İLÇE"
Algorithm do not token like "KISIMDA", "İLÇE". It takes bot of them like "KISIMDA.İLÇE"
How can i solve this problem?
thanks a lot
token part
train.tokens<-tokens(text2$Text, what="word",
remove_numbers=TRUE, remove_symbol=TRUE, remove_separators=TRUE,
remove_punct= TRUE, remove_hyphens=TRUE)
train.tokens<-tokens_tolower(train.tokens)
train.tokens1<-tokens_select(train.tokens, stopwords("tr", source = "stopwords-iso"),
selection = "remove")