frequency or proportion plot by date or week with ggplot2

Hello, I am an epidemiologist and I am quite new to R. I have a simple vaccination data in long format which looks like:

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),date=c("01/12/2020","02/12/2020","03/12/2020","04/12/2020",
"01/31/2020","03/12/2020","04/05/2020","02/12/2020","04/12/2020","05/12/2020","01/12/2020"),vac_date=c("","02/02/2020","","04/02/2020","","","04/01/2020","","04/01/2020","05/01/2020",""),dose=c('',1,'',2,'','',1,'',1,2,''))

id = patient's identification
date = survey date
vac_date = vaccination date
dose = indicating the vaccination dose

I am really having trouble creating the frequency line plot in my mind. I tried

ggplot(data, aes(x = date, y = vac_date)) + geom_line()

The dates and counts of vaccination are confusing. I would like to compute 2 plots:

  1. frequency or proportion plot by date or week regardless of dose
  2. frequency or proportion plot by date or week by dose (overlay)
    as shown in the following pic

https://imgur.com/KEoV4cR

Might someone please provide some help on getting the above plots? Thanks.

Does this make the kinds of plots you want? I changed the dates to numeric dates and used functions from the dplyr package to count the number of vaccinations on each date.

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),
                 date=c("01/12/2020","02/12/2020","03/12/2020","04/12/2020",
                        "01/31/2020","03/12/2020","04/05/2020","02/12/2020",
                        "04/12/2020","05/12/2020","01/12/2020"),
                 vac_date=c("","02/02/2020","","04/02/2020","","","04/01/2020","",
                            "04/01/2020","05/01/2020",""),
                 dose=c('',1,'',2,'','',1,'',1,2,''))
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
data <- data |> mutate(date = mdy(date), 
               vac_date = mdy(vac_date))
data_Summary1 <- data |> group_by(vac_date) |> 
  filter(!is.na(vac_date)) |> 
  summarize(N = n())
data_Summary1
#> # A tibble: 4 × 2
#>   vac_date       N
#>   <date>     <int>
#> 1 2020-02-02     1
#> 2 2020-04-01     2
#> 3 2020-04-02     1
#> 4 2020-05-01     1
ggplot(data_Summary1, aes(vac_date, N)) + geom_line()


data_Summary2 <- data |> group_by(vac_date, dose) |> 
  filter(!is.na(vac_date)) |> 
  summarize(N = n())
#> `summarise()` has grouped output by 'vac_date'. You can override using the
#> `.groups` argument.
ggplot(data_Summary2, aes(vac_date, N, color = dose)) + geom_line()

Created on 2023-10-20 with reprex v2.0.2

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.