Selecting data in a for loop

Hi, I’m trying to loop thru a monthly dataset and grab rolling 12 month periods to analyze. I’ll use AirPassengers as an example but its not the data I want to analyze.

AirPassengers is monthly data from 1949 to 1960, 12 years, 12 months per year or 144 rows.

I’ve created a date variable to go with air passengers.

There are 133 groups I’d like to select as below (if I’ve done the math right).

As the loop progresses I’d like to append each 12 month group to the previous. The ultimate dataset should have 133 groups of 12 each or 1596 rows (=133*12)

For loop 1 I’d like to select the 12 rows Jan 1949 to Dec 1949 (rows 1 to 12).

For loop 2 I’d like to select the 12 rows Feb 1949 to Jan 1950 (rows 2 to 13).

And so on …

For loop 133 I’d like to select the 12 rows Jan 1960 to Dec 1960 (rows 133 to 144).

The code below goes haywire. Even though I want base[(i:11+i),] to select row i to row i+11 (12 rows), it’s not doing it. Will appreciate help.

#get the airpassenger data
base <- as.data.frame(AirPassengers)

colnames(base) <-"ap"

#create date variable
d0 = data.frame()
for (y in 1949:1960)
for (m in 1:12) {
ym = c(y,m,1)
d0 = rbind(d0,ym)
}

colnames(d0) <- c("y","m","d")

base$date <-as.Date(with(d0,paste(y,m,d,sep = "-")),"%Y-%m-%d")

#calculate end loop
n=nrow(base)-11

#d2 will contain rolling 12 month groups
d2 = data.frame()

#for loop to select rolling 12 month groups
for (i in 1:n) {
d1 <- as.data.frame(base[(i:11+i),])
d2=rbind(d2,d1)
}

I would probably do something like

d2 <- as.data.frame(matrix(base, ncol = 12, byrow = T))

byrow could be True or False depending on your needs.
I hope that helps

Thanks, I actually want to do analysis on each group of 12. Did you look at d2 from your code. It seems nonsensical. I want to loop thru the data selecting rolling 12 month groups analyze each group then append it to the previous group. The matrix/ncol/byrow doesn't seem suited to that

I think your last for loop is not doing what you intend. Compare these two versions of the code choosing the rows pulled from base.

i <- 1
i:11 + i
 [1]  2  3  4  5  6  7  8  9 10 11 12

i <- 1
i:(11 + i)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

You should use the second version.

for (i in 1:n) {
  d1 <- as.data.frame(base[(i:(11+i)),])
  d2=rbind(d2,d1)
}

As an aside, the as.data.frame() in the for loop is unnecessary.

2 Likes

Ah yes, sorry for that.

For independent groups it should actually be (corrected version)
as.data.frame(matrix(base$x, ncol = 12, byrow = F))

and for 11/12 correlated groups, it could be
unname(as.data.frame(lapply(seq(length(base$x)), function(i) base[(i):(i+11),1])))

but there may be other ways

Perfect. Thank you. Gotta love programming, ofc a parenthesis .... And you are right the as.data.frame() is unnecessary, so I dropped it. Just out of curiosity, any idea what's up w the rownames. Base has rownames ordered 1 to 144 like you'd expect. d2 has inexplicable rownames like 1711 and 13312 out of order and greater than the 1596 number of rows.

I don't know how the row names work. After a little investigation, I think the row names are constructed by adding numeric suffixes to the initial row names to avoid duplicates. Remember that row names are text.
Try inserting the following code just before your last for loop. Then run the loop and you can see how the row names change as you scroll down through d2.

Nms <- c(LETTERS[1:26], paste0("A",LETTERS[1:26]), paste0("B",LETTERS[1:26]), paste0("C",LETTERS[1:26]),
         paste0("D",LETTERS[1:26]), paste0("E",LETTERS[1:14]))
row.names(base) <- Nms
1 Like

Thanks, I mostly was wondering how R works, why it names the way it does. Your explanation makes a lot of sense.