I am going to find the top 10 hashtags for each of the several thousand communities in my dataset. Each user_name in the dataset, belongs to a specific community (e.g., "a", "b", "c", "d" belong to community 0). A sample of my dataset with only 10 communities looks like the following:
df <- data.frame(N = c(1,2,3,4,5,6,7,8,9,10),
user_name = c("a","b","c","d","e","f", "g", "h", "i", "j"),
community_id =c(0,0,0,0,1,1,2,2,2,3),
hashtags = c("#illness, #ebola", "#coronavirus, #covid", "#vaccine, #lie", "#flue, #ebola, #usa", "#vaccine", "#flue", "#coronavirus", "#ebola", "#ebola, #vaccine", "#china, #virus") )
To find the top 10 hashtags for EACH community (in the following case, community 0) I need to run the following codes:
#select community 0
df_comm_0 <- df %>%
filter (community == 0)
#remove NAs
df_comm_0 <- na.omit(df_comm_0)
#find top 10 hashtags
df_hashtags_0 <- df_comm_0 %>%
unnest_tokens(hashtag, hashtags, token = "tweets") %>%
count(hashtag, sort = TRUE) %>%
top_n(10)
I know using a loop, can save me from running my codes ~15,000 times (number of communities in the dataset). I am not familiar with loop and even after searching for a couple of hours, was not able to write a loop. The following code is what I wrote which gives me the hashtags for the entire dataset!
x <- (df$community_id)
for (val in x) {
print (
df %>%
unnest_tokens(hashtag, hashtags, token = "tweets") %>%
count(hashtag, sort = TRUE) %>%
top_n(10)
)
}
print()
Is there a way I could run the hashtag freqs for all communities by looping through all of them and outputting the top 10 hashtags for each community to 1 file (or separate files)?
Your assistant is much appreciated.