Hi I have a data set with "initial public offerings"(IPOs) and their IPO date. One column has the stock tickers and the other has the IPO date.
I want to find out how many tweets with the ticker symbol in the hashtag there are 1, 2 and 3 days prior to the IPO date. The problem is that I have nearly 3000 tickers...
Does anybody know how I can let Rstudio run these 3000 tickers as the query for these 3 points in time trough the count_all_tweets command? That would help me out massively
I'm sorry If i messed up the formatting of this post. I'm kind of new on here.
Below is one potential solution. I don't have a token for academictwitteR, so I can't be certain of the outcome, but I think this general approach should work.
library(tidyverse)
library(academictwitteR)
library(lubridate)
# sample data
dataFrame = data.frame(
Ticker = c('ticker1', 'ticker2', 'ticker3'),
Offer.date = c('2020-01-01', '2022-03-20', '2021-05-15')
)
dataFrame
#> Ticker Offer.date
#> 1 ticker1 2020-01-01
#> 2 ticker2 2022-03-20
#> 3 ticker3 2021-05-15
# reshape data; determine dates of prior 3 days
df = dataFrame %>%
mutate(day1 = as.Date(Offer.date) - days(1),
day2 = as.Date(Offer.date) - days(2),
day3 = as.Date(Offer.date) - days(3)
) %>%
pivot_longer(c(-'Ticker', -'Offer.date'), values_to = 'date')
df
#> # A tibble: 9 × 4
#> Ticker Offer.date name date
#> <chr> <chr> <chr> <date>
#> 1 ticker1 2020-01-01 day1 2019-12-31
#> 2 ticker1 2020-01-01 day2 2019-12-30
#> 3 ticker1 2020-01-01 day3 2019-12-29
#> 4 ticker2 2022-03-20 day1 2022-03-19
#> 5 ticker2 2022-03-20 day2 2022-03-18
#> 6 ticker2 2022-03-20 day3 2022-03-17
#> 7 ticker3 2021-05-15 day1 2021-05-14
#> 8 ticker3 2021-05-15 day2 2021-05-13
#> 9 ticker3 2021-05-15 day3 2021-05-12
# function to get tweet counts
get_tweet_counts = function(i) {
# format query and date
d = df[i,] %>%
mutate(query = paste0('#', Ticker),
tweet_date = paste0(date, 'T00:00:00Z'))
# get tweet counts
tweet_count = count_all_tweets(
query = d$query,
start_tweets = d$tweet_date,
end_tweets = d$tweet_date,
bearer_token = get_bearer())
# join counts back to d
#(assumes output of counts all tweets is a 2-column data frame: query, count)
left_join(d, tweet_count)
}
# gather tweet counts for each row of df
tweets = map(1:nrow(df), get_tweet_counts) %>%
bind_rows()