I'm trying to reduce my copy-paste by using a function (see below). However, I struggle with understanding "how" to properly implement functions in this context. In this example, my for loop only returns the last term of interest. My final goal is to create a single data table (tibble) with all the data included.
What am I doing wrong in my implementation of the for loop in order to run the search_tweet function for each of my terms of interest?
library(here)
#> here() starts at C:/Users/renedherrera/AppData/Local/Temp/RtmpWMhoFi/reprex32007f8a5dd7
library(tidyverse)
library(rtweet) # search twitter data for hashtags of interest
#>
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:purrr':
#>
#> flatten
# hashtags of interest
hashtags <- c("#adcsm", # adrenal cancer
"#amsm", #advanced metastatic cancer
"#ancsm", #anal cancer
"#ayacsm", #adolescent and young adult cancer
"#bcsm" #breast cancer
) #this continues on for several more hashtags
# functions to return twitter status data
adcsm <- search_tweets("#adcsm", include_rts = FALSE, n = 5)
amsm <- search_tweets("#amsm", include_rts = FALSE, n = 5)
ancsm <- search_tweets("#ancsm", include_rts = FALSE, n = 5)
ayacsm <- search_tweets("#ayacsm", include_rts = FALSE, n = 5)
bcsm <- search_tweets("#bcsm", include_rts = FALSE, n = 5) # and so on
# but it is repetitive and I think some function is better
# i tried a for loop but it's not quite right
for(i in hashtags) {
i <- search_tweets(i, include_rts = FALSE, n = 5)
}
The easiest way to handle this is using map from purrr (or, specifically, to get your tibble output, map_dfr):
library(here)
#> here() starts at /private/var/folders/hx/j3m2y89n5_jfcrft6bxdq9s40000gn/T/Rtmp2Gxj3Q/reprex3964f093bf3
library(tidyverse)
library(rtweet) # search twitter data for hashtags of interest
#>
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:purrr':
#>
#> flatten
# hashtags of interest
hashtags <- c("#adcsm", # adrenal cancer
"#amsm", #advanced metastatic cancer
"#ancsm", #anal cancer
"#ayacsm", #adolescent and young adult cancer
"#bcsm" #breast cancer
) %>%
set_names() # Name the vector so you have meaningful information in the new col
#this continues on for several more hashtags
# functions to return twitter status data
adcsm <- search_tweets("#adcsm", include_rts = FALSE, n = 5)
amsm <- search_tweets("#amsm", include_rts = FALSE, n = 5)
ancsm <- search_tweets("#ancsm", include_rts = FALSE, n = 5)
ayacsm <- search_tweets("#ayacsm", include_rts = FALSE, n = 5)
bcsm <- search_tweets("#bcsm", include_rts = FALSE, n = 5) # and so on
# Create df by mapping search_tweets over the named vector
outdf <- purrr::map_dfr(hashtags, search_tweets, include_rts = FALSE, n = 5, .id = "searchtag")
Regarding the purrr::map approach: If you use descriptive names for the search tags, you can add those as a column to the data frame. For example:
# hashtags of interest
hashtags <- c("adrenal cancer"="#adcsm",
"advanced metastatic cancer"="#amsm",
"anal cancer"="#ancsm",
"adolescent and young adult cancer"="#ayacsm",
"breast cancer"="#bcsm"
)
d = hashtags %>%
map_df(~ search_tweets(.x, include_rts=FALSE, n=5) %>%
mutate(search.tag=.x),
.id="search.tag.description")
The loop approach wasn't working because the code overwrites i each time through the loop, so only the last search is saved. Here's a way to save all of the output:
out = list()
for(i in hashtags) {
out[[i]] <- search_tweets(i, include_rts = FALSE, n = 5)
# If output data has at least one row, add search tag info
if(nrow(out[[i]]) > 0) {
out[[i]]$search.tag = i
out[[i]]$search.tag.description = names(hashtags)[match(i, hashtags)]
}
}
# Combine into a single data frame
out.df = do.call(rbind, out)
Thanks, I appreciate your help. This is closer to what I was initially trying to do. Seems that I need to work on understanding list and the [[]] then understanding how the lists are converted to a single data frame. Obviously I have many gaps in my understanding of R.