Problem with non-ASCII characters in DocumentTermMatrix

EconomiCurtis · June 20, 2018, 5:36pm

Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

In this case, I'd include a snippet of your dataset object, which includes non-ASCII characters to replicate your error.
And that way you can skip setwd('C:/rscripts/tweet_sentiment')
and dataset = read.csv('hillary_tweets.csv')

I'm having a hard time replicating your error, but as a quick suggestion, you might check out the r-package rtweet. It has a plain_tweets function that takes your tweets and returns a value "reformatted with ascii encoding and normal ampersands and without URL links, line breaks, fancy spaces/tabs, fancy apostrophes."

And there are tools to deal with non-ASCII characters in R rather than removing them. StackOverflow has nice discussions on this. And a reprex might be useful to help along these lines too.