Sorry I just saw this. There is a mistake formatting the date, it should have been "%Y-%m-%d" based on the format you wrote it in.
Here is the code you had with that adjustment:
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(dplyr)
library(ggplot2)
covid$date <- as.Date(covid$date,'%d/%m/%y')
filt1 <- covid %>%
filter(between(covid$date, as.Date("2020-03-17", format = "%Y-%m-%d"),
as.Date("2020-08-16", format = "%Y-%m-%d")))
View(filt1)
You also don't need to specify the format in this case, so you can remove the whole piece with as.Date()
and format
, like so:
filter(between(covid$date, "2020-03-17", "2020-08-16"))
If you wanted to get the daily growth factor across all the eight states, here is how I would do it:
growth_factor <- filt1 %>%
group_by(state_abbrev) %>%
mutate(previous_day_confirmed = lag(confirmed, n=1, default=NA, order_by = date)) %>%
mutate(growth_factor = confirmed/previous_day_confirmed)
The first mutate step gets the confirmed cases from the previous day, and the second mutate step calculates the growth factor as confirmed/previous_day_confirmed.
In the code above I used the variable confirmed
as "new cases", so you'll have to adjust that if you meant a different variable instead of confirmed
. I also can't test the code without making an example dataset, if you could create a reproducible example using the reprex package I could help more with the growth factor calculation (look at "Scenario 2"): https://reprex.tidyverse.org/articles/articles/datapasta-reprex.html
Sorry for the delay, didn't see this until now!