How can I sort/order a series of dates by month in R

I have been referring to this question [1]:R Optimal way to create time series from start and end dates for groups because I am trying to do the same thing with my data. I was not trying both methods at the same time, I tested both out separately. I get small errors when trying either solution. Any help would be greatly appreciated! My final dataset is going to be very large so I am more interested in solution 1.

Packages and working code

library(AER) 
library(sandwich) 
library(car) 
library(stargazer) 
library(ggplot2)
library(standardize)
library(tidyverse)
library(dplyr)
library(lubridate)
library(data.table)

#date format is mm/dd/yy
csuite <- read.csv("boardex_new.csv", header=TRUE, sep=",")

#I want to make the results of these lines into a new df
csuite %>%
 #selects the columns in a specific order
 select("CompanyID", "Ticker", "RoleName", "DateStartRole", "DateEndRole", "DirectorName", "DirectorID") %>%
 # Filters the RoleName column as specified below
 filter(RoleName == "VP/COO" | RoleName == "CEO" | RoleName == "COO" | RoleName == "VP/CFO")

Solution 1

range <- rbindlist(lapply(genDataRange(dat$startdate, dat$enddate, dat$id),as.data.frame))

Error in genDataRange(csuite$DateStartRole, csuite$DateEndRole) : 
  could not find function "genDataRange"

Solution 2

csuite <- data.table(csuite)
csuite[,`:=`(DateStartRole = as.Date(DateStartRole, DateEndRole   = as.Date(DateEndRole))]
csuite[,num_mons:= length(seq(from=040196, to=072121, by='month')),by=1:nrow(csuite)]

Error: unexpected ']' in "csuite[,`:=`(DateStartRole = as.Date(DateStartRole, DateEndRole   = as.Date(DateEndRole))]"
Error in del/by : non-numeric argument to binary operator

Hi @student2022,
Without a sample of your data it is hard to give you a reliable solution; however, can you achieve your desired output by simply converting the start and end dates, and then sorting using dplyr::arrange? Something like this:

csuite %>%
  select("CompanyID", "Ticker", "RoleName", "DateStartRole", 
         "DateEndRole", "DirectorName", "DirectorID") %>%
  filter(RoleName %in% c("VP/COO", "CEO", "COO", "VP/CFO")) %>%
  mutate(across(starts_with("Date"), ~as.Date(.))) %>% 
  arrange(DateStartRole) -> new_csuite

HTH

In Solution 2 try this:

csuite[,`:=`(DateStartRole = as.Date(DateStartRole), DateEndRole   = as.Date(DateEndRole)]
csuite<-structure(list(id = c(723654, 885618, 269861, 1383642, 250276, 
815511, 1506680, 1567855, 667345, 795731), startdate = c("20080629", 
"20081201", "20060927", "20100203", "20060831", "20080910", 
"20100411", "20100515", "20080412", "20080828"), enddate = c("20080813", 
"20090208", "20071012", "20100909", "20070630", "20100427", 
"20100413", "20100516", "20100420", "20100309")), .Names = c("id", 
"DateStartRole", "DateEndRole"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "6", "7", "8", "9", "10", "11"))

Hi @student2022,
Having some of your data makes it soooo much easier:

suppressPackageStartupMessages(library(tidyverse))
csuite <- structure(list(id = c(723654, 885618, 269861, 1383642, 250276,
815511, 1506680, 1567855, 667345, 795731), startdate = c("20080629",
"20081201", "20060927", "20100203", "20060831", "20080910",
"20100411", "20100515", "20080412", "20080828"), enddate = c("20080813",
"20090208", "20071012", "20100909", "20070630", "20100427",
"20100413", "20100516", "20100420", "20100309")), .Names = c("id",
"DateStartRole", "DateEndRole"), class = "data.frame", row.names = c("1",
"2", "3", "4", "6", "7", "8", "9", "10", "11"))

csuite
#>         id DateStartRole DateEndRole
#> 1   723654      20080629    20080813
#> 2   885618      20081201    20090208
#> 3   269861      20060927    20071012
#> 4  1383642      20100203    20100909
#> 6   250276      20060831    20070630
#> 7   815511      20080910    20100427
#> 8  1506680      20100411    20100413
#> 9  1567855      20100515    20100516
#> 10  667345      20080412    20100420
#> 11  795731      20080828    20100309

csuite %>%
  #select("CompanyID", "Ticker", "RoleName", "DateStartRole",
  #       "DateEndRole", "DirectorName", "DirectorID") %>%
  #filter(RoleName %in% c("VP/COO", "CEO", "COO", "VP/CFO")) %>%
  mutate(across(starts_with("Date"), ~as.Date(., format="%Y%m%d"))) %>%
  arrange(DateStartRole, DateEndRole) -> new_csuite

new_csuite
#>         id DateStartRole DateEndRole
#> 6   250276    2006-08-31  2007-06-30
#> 3   269861    2006-09-27  2007-10-12
#> 10  667345    2008-04-12  2010-04-20
#> 1   723654    2008-06-29  2008-08-13
#> 11  795731    2008-08-28  2010-03-09
#> 7   815511    2008-09-10  2010-04-27
#> 2   885618    2008-12-01  2009-02-08
#> 4  1383642    2010-02-03  2010-09-09
#> 8  1506680    2010-04-11  2010-04-13
#> 9  1567855    2010-05-15  2010-05-16

Created on 2021-07-24 by the reprex package (v2.0.0)

That really helped, thank you! However, the df new_csuite now puts all the start and end dates as .

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.