I have made a graph using an online data set to plot data. Currently the graph is created by the following script. The varible Cases is what I am using to plot the data by DateRep. I would like for my graph to start after the DateRep value reaches a value of 100. I know that I could just create a subset for after the data reaches 100, however I am looking to see if this can all be done inside 1 function.
data %>%
filter(`Countries and territories` %in% c("Canada")) %>%
ggplot(aes(DateRep, Cases, col = `Countries and territories`)) +
#geom_point() +
#geom_line() +
scale_y_log10() +
geom_smooth() +
theme(legend.position = "bottom") +
ggtitle("Plot of COVID-19 Cases In Canada") +
xlab("Date In Months") +
#ylab("Cases Per Day Count")
ylab("Cases Per Day Count - Log10 Applied")
I took a look at my dataset, I was thinking that I have this piece of code. This is what sets what countries I want to look at and plots Countries on one axis, and cases on another axis. As-well, in this dataset we have a variable called deaths. What I would like to do is tally up all the deaths values, and then start the building of my graph, (Plotting the cases by date) after the death toll for that country reaches a value of 100. that's when I would like my plotting to occur . I can upload a .csv file of the dataset.
This calculates the cumulative deaths by country and then filters cases with more than 100 accumulated deaths, that is what I understood from your explanation.
Hello, I keep trying to recreate your graph from the after 100 mark and it still does not work, may you try recreating it based off of this code where you are pulling the code from the URL just like in mine. In the second and third line I am implementing your code of how to bring it to >= 100.
#Read In The Files
install.packages("readxl")
#these libraries are necessary
library(readxl)
library(httr)
library(tidyverse)
#create the URL where the dataset is stored with automatic updates every day
url <- paste("https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-",format(Sys.time(), "%Y-%m-%d"), ".xlsx", sep = "")
#download the dataset from the website to a local temporary file
GET(url, authenticate(":", ":", type="ntlm"), write_disk(tf <- tempfile(fileext = ".xlsx")))
#read the Dataset sheet into “R”
data <- read_excel(tf)
data %>%
filter(countriesAndTerritories %in% c("Canada")) %>%
mutate(totalDeathds = cumsum(deaths)) %>%
filter(totalDeathds >= 100) %>%
ggplot(aes(dateRep, cases, col = countriesAndTerritories)) +
geom_point() +
geom_line() +
scale_y_log10() +
geom_smooth() +
#abline(v = mean(Canada_Subset$Cases),col="red", lwd=3, lty=2) +
theme(legend.position = "bottom") +
ggtitle("Plot of COVID-19 Cases In Majority Countries") +
xlab("Date In Months") +
ylab("Cases Per Day Count") +
#ylab("Cases Per Day Count - Log10 Applied")