Too many dates in plot?

I am trying to plot the total number of cumulative cases for COVID-19 using this Canadian dataset

I am trying to plot using provinces "Canada" with date on the X-Axis and Number of cases on the Y-Axis. I wrote a .R script however when the plot generates either there is too many dates or the plot does not show correctly. How can I modify this to show the graph correctly?

I used the filter function to filter all the cases in Canada however would there be a better way to calculate total cumlative number of cases?

data <- read.csv("2020-05-20.csv",header=T)

data2 <- filter(data, data$prname == "Canada")

data2 %>%
  ggplot(aes(x=date, y=numconf)) +
  geom_point() +
  geom_line() +
  geom_smooth()

I suspect your dates are being interpreted as characters. Try this

library(lubridate)
data <- read.csv("2020-05-20.csv",header=T)

data2 <- filter(data, data$prname == "Canada") %>%
      mutate(date = dmy(date))

data2 %>%
  ggplot(aes(x=date, y=numconf)) +
  geom_point() +
  geom_line() +
  geom_smooth()

@FJCC If I wanted to illustrate the dates like this on the x-axis, how could I modify this? I've updated the script however the data points do not show. As-well, it seems that many of the data points (dates) are not in chronological order?

data2 %>%
  ggplot(aes(x=date, y=numconf)) +
  geom_point() +
  #geom_line() +
  geom_smooth() +
  ggtitle("Cumulative Cases of COVID-19 in Canada") +
  ylab("Number of Cases") +
  xlab("Date over Time") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

The dates are still stored as characters. You need to convert them first as FJCC showed.

Hi @martin.R, I did notice that the script labeled as mutate(date = dmy(date)), however this did not show the plot correctly. When I tried this, rather than showing all the dates at the bottom there was just 5 lables. Jan, Feb, Mar, April, May. I am looking to show all the dates.

library(lubridate)
data <- read.csv("2020-05-20.csv",header=T)

data2 <- filter(data, data$prname == "Canada") 

data2$date <- mutate(date = dmy(date))

data2 %>%
  ggplot(aes(x=date, y=numconf)) +
  geom_point() +
  #geom_line() +
  geom_smooth() +
  ggtitle("Cumulative Cases of COVID-19 in Canada") +
  ylab("Number of Cases") +
  xlab("Date over Time") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Here is an example of having a date on the x axis for every 7 days. You can decrease that to every day if you want by setting by = 1 in the seq.Date function, but I think that will be very hard to read.

library(dplyr, warn.conflicts = FALSE)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:dplyr':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(ggplot2)
data <- read.csv("~/R/Play/covid19.csv",header=T)

data2 <- filter(data, data$prname == "Canada") %>%
  mutate(date = dmy(date))

data2 %>%
  ggplot(aes(x=date, y=numconf)) +
  geom_point() +
  geom_line() +
  geom_smooth() +
  scale_x_date(breaks = seq.Date(from = as.Date("2020-01-31"), 
                                     to = as.Date("2020-05-20"), by = 7)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2020-05-20 by the reprex package (v0.3.0)

Thank you @FJCC, this helps I was really stuck

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.