I am trying to gather some tweets using the {rtweet}
package and store them in a MySQL database. But I am getting errors whenever I try to upload a tweet that contains emojis.
Here is my code:
# loading packages
library(rtweet)
library(DBI)
library(RMariaDB)
library(dplyr)
library(dbplyr)
library(lubridate)
# create twitter api token
twitterToken <- rtweet_bot(
api_key = "*****",
api_secret = "*****",
access_token = "*****",
access_secret = "*****"
)
# search tweets
tweets <- search_tweets(q = "beautiful", n = 500, type = "recent", include_rts = FALSE, token = twitterToken)
tweets$Topic <- "beautiful"
tweets$created_at <- parse_date_time(tweets$created_at, orders = c("%a %b %d %T %z %Y", "%Y-%m-%d %H:%M:%S"))
tweets$screen_name <- users_data(tweets)$screen_name
tweets$status_url <- paste0("https://twitter.com/", tweets$screen_name, "/status/", tweets$id_str)
tweets <- tweets %>% select(Topic, status_url, screen_name, created_at, text, favorite_count, retweet_count)
# upload to database
con <- dbConnect(MariaDB(), dbname="dbname", username="username", password="password", host="db_host", port=db_port, ssl.ca = "ssl.pem", load_data_local_infile = TRUE)
dbWriteTable(con, "testTweetsDB", tweets, overwrite = T)
This throws the following error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"MariaDBConnection", "character", "tweets"’
This is a bit odd because it used to work before I upgraded my R version and updated all the packages. But is not the main issue. I can work around this by the following codes:
tweets <- as.data.frame(tweets)
dbWriteTable(con, "testTweetsDB", tweets, overwrite = T)
This time I get the following error:
Error: Error executing query: Invalid utf8 character string: '@Alyri_tv So beautifull girl '
The string it complains about is the first tweet that contains emojis. It works perfectly fine if I only select tweets that don't have any emojis in them. It works even if the tweets contain Chinese, Korean and other language characters. It is the emojis that are causing the problem.
The default collation for the database is utf8mb4_unicode_ci