I need to filter a vector of dates by year given that the instructions before calculating how many protests in the data set occurred in each month in 2020 (the data set contains information on protests over several years), but I have not been able to figure out how to do this.
Can anyone explain how I might go about filtering the original data set (protest_dates) by both month and year? (Bearing in mind that I am only allowed to use the StringR package for this assignment?)
I get that, that's why I tried to phrase my original question so that it would not be necessary to access the data itself, I'm just trying to understand the concepts associated with these types of data.
if you are only allowed to use stringr (which is a suboptimal way of working with dates), you will have to chop up a date string into its representative components (typically, day/month/year) and go from there.
I personally prefer to deal with dates with the R package lubridate, and filter/subset with dplyr. If you are dealing with a lot of date-times, the clock package is also quite handy.
With stringr - Dates usually have a consistent separator and format. For example, separating year month and day with a slash or dash. Using only stringr, there's a nice set of split and substring functions, https://stringr.tidyverse.org/.
library(stringr)
library(dplyr)
date = c("2022-01-01", "2022-01-02")
date %>% str_split_fixed("-", n=3)
#> [,1] [,2] [,3]
#> [1,] "2022" "01" "01"
#> [2,] "2022" "01" "02"
date %>% str_sub(start = 1, end = 4)
#> [1] "2022" "2022"
date %>% str_sub(start = 6, end = 7) %>% as.numeric()
#> [1] 1 1
date %>% str_sub(start = 9, end = 10) %>% as.numeric()
#> [1] 1 2