How to repeat a function with different variables

uazccrenedario · November 13, 2020, 6:49pm

I'm trying to reduce my copy-paste by using a function (see below). However, I struggle with understanding "how" to properly implement functions in this context. In this example, my for loop only returns the last term of interest. My final goal is to create a single data table (tibble) with all the data included.

What am I doing wrong in my implementation of the for loop in order to run the search_tweet function for each of my terms of interest?

library(here)
#> here() starts at C:/Users/renedherrera/AppData/Local/Temp/RtmpWMhoFi/reprex32007f8a5dd7
library(tidyverse)
library(rtweet) # search twitter data for hashtags of interest
#> 
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:purrr':
#> 
#>     flatten

# hashtags of interest 
hashtags <- c("#adcsm", # adrenal cancer
              "#amsm", #advanced metastatic cancer
              "#ancsm", #anal cancer
              "#ayacsm", #adolescent and young adult cancer
              "#bcsm" #breast cancer
) #this continues on for several more hashtags

# functions to return twitter status data 
adcsm <- search_tweets("#adcsm", include_rts = FALSE, n = 5) 
amsm <- search_tweets("#amsm", include_rts = FALSE, n = 5) 
ancsm <- search_tweets("#ancsm", include_rts = FALSE, n = 5) 
ayacsm <- search_tweets("#ayacsm", include_rts = FALSE, n = 5) 
bcsm <- search_tweets("#bcsm", include_rts = FALSE, n = 5) # and so on

# but it is repetitive and I think some function is better
# i tried a for loop but it's not quite right
for(i in hashtags) {
  i <- search_tweets(i, include_rts = FALSE, n = 5)
}

^{Created on 2020-11-13 by the reprex package (v0.3.0)}

steve · November 13, 2020, 7:18pm

The easiest way to handle this is using map from purrr (or, specifically, to get your tibble output, map_dfr):

library(here)
#> here() starts at /private/var/folders/hx/j3m2y89n5_jfcrft6bxdq9s40000gn/T/Rtmp2Gxj3Q/reprex3964f093bf3
library(tidyverse)
library(rtweet) # search twitter data for hashtags of interest
#> 
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:purrr':
#> 
#>     flatten


# hashtags of interest 
hashtags <- c("#adcsm", # adrenal cancer
              "#amsm", #advanced metastatic cancer
              "#ancsm", #anal cancer
              "#ayacsm", #adolescent and young adult cancer
              "#bcsm" #breast cancer
) %>%
  set_names() # Name the vector so you have meaningful information in the new col

#this continues on for several more hashtags

# functions to return twitter status data 
adcsm <- search_tweets("#adcsm", include_rts = FALSE, n = 5) 
amsm <- search_tweets("#amsm", include_rts = FALSE, n = 5) 
ancsm <- search_tweets("#ancsm", include_rts = FALSE, n = 5) 
ayacsm <- search_tweets("#ayacsm", include_rts = FALSE, n = 5) 
bcsm <- search_tweets("#bcsm", include_rts = FALSE, n = 5) # and so on

# Create df by mapping search_tweets over the named vector
outdf <- purrr::map_dfr(hashtags, search_tweets, include_rts = FALSE, n = 5, .id = "searchtag")

^{Created on 2020-11-13 by the reprex package (v0.3.0)}

uazccrenedario · November 13, 2020, 7:31pm

Thanks. Now I need to read up on purrr and map.

joels · November 13, 2020, 7:43pm

Regarding the purrr::map approach: If you use descriptive names for the search tags, you can add those as a column to the data frame. For example:

# hashtags of interest 
hashtags <- c("adrenal cancer"="#adcsm", 
              "advanced metastatic cancer"="#amsm",
              "anal cancer"="#ancsm", 
              "adolescent and young adult cancer"="#ayacsm",
              "breast cancer"="#bcsm"
)

d = hashtags %>% 
  map_df(~ search_tweets(.x, include_rts=FALSE, n=5) %>% 
           mutate(search.tag=.x), 
         .id="search.tag.description")

The loop approach wasn't working because the code overwrites i each time through the loop, so only the last search is saved. Here's a way to save all of the output:

out = list()
for(i in hashtags) {
  out[[i]] <- search_tweets(i, include_rts = FALSE, n = 5)
  
  # If output data has at least one row, add search tag info
  if(nrow(out[[i]]) > 0) {
    out[[i]]$search.tag = i
    out[[i]]$search.tag.description = names(hashtags)[match(i, hashtags)]
  }
}

# Combine into a single data frame
out.df = do.call(rbind, out)

uazccrenedario · November 13, 2020, 9:09pm

Thanks, I appreciate your help. This is closer to what I was initially trying to do. Seems that I need to work on understanding list and the [[]] then understanding how the lists are converted to a single data frame. Obviously I have many gaps in my understanding of R.

system · November 20, 2020, 9:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.