For fun, I would like to count the number of R packages on CRAN each month for the last 20 years or so (I believe CRAN originated in 1997).
I know the Microsoft have been taking daily snapshots of CRAN since Sept 2014. This allows me to do the following for example:
library(tidyverse)
library(lubridate)
# Get a sequences of dates in steps of one month
# from Sept 17, 2014 to today
date_sequence <- seq(from = ymd("2014-09-17"),
to = today(),
by='months')
# function to count packages on cran
# on a given date
count_cran_packages <- function(snapshot_date){
repo <- sprintf(
'https://cran.microsoft.com/snapshot/%s/',
snapshot_date
)
available.packages(repos = repo) %>%
nrow()
}
# count packages on cran on each day in sequence
npackages_df <- tibble(date = date_sequence,
n = map_dbl(date, count_cran_packages)
) %>%
# counts of 0 occur due to errors, so zap them
filter(n > 0)
That's great, and just what I want. However, this only goes back as far as September, 2014. I would like to go back as far as possible.
Does anyone know of any other way, of getting the number of packages available on CRAN every month for the past 20 years, or at least further back than 2014?