Storing "Chunks" of Rows in Lists

omario · March 17, 2023, 1:08pm

I have this data frame in R:

my_data = data.frame(id = 1:100, var = rnorm(100,100,100))

I want to make two lists (my_list_1, my_list_2) that store data from this frame like this:

my_list_1 = list()
my_list_2 = list()

my_list_1[1] = my_data[1:10,]
my_list_2[1] = my_data[11:15,]
my_list_1[2] = my_data[1:15,]
my_list_2[2] = my_data[16:20,]
my_list_1[3] = my_data[1:20,]
my_list_2[3] = my_data[21:25,]
# etc.

Until all rows from my_data have been exhausted. As we can see:

my_list_1 always starts from the first row of my_data and keeps adding on 5 rows
my_list_2 starts from where it left off and only takes 4 rows at a time

I tried to solve the problem like this using the ceiling function in R:

# set the size of the chunks
chunk_size_1 = 10
chunk_size_2 = 5

# loop through the rows of my_data, adding chunks to the lists
for (i in 1:nrow(my_data)) {
  # calculate the current chunk numbers
  chunk_num_1 = ceiling(i / chunk_size_1)
  chunk_num_2 = ceiling((i + (chunk_size_2 - 1)) / chunk_size_2)
  
  # add the current row to the appropriate chunk
  if (!exists(paste0("my_list_1[", chunk_num_1, "]"))) {
    my_list_1[[chunk_num_1]] = my_data[1:i,]
  } else {
    my_list_1[[chunk_num_1]] = my_data[1:(chunk_num_1 * chunk_size_1),]
  }
  
  if (!exists(paste0("my_list_2[", chunk_num_2, "]"))) {
    my_list_2[[chunk_num_2]] = my_data[(i+1):(i+chunk_size_2),]
  } else {
    my_list_2[[chunk_num_2]] = my_data[((chunk_num_2-1) * chunk_size_2 + 1):(chunk_num_2 * chunk_size_2),]
  }
}

# remove NAs 
my_list_1 = lapply(my_list_1, function(x) x[!is.na(x)])
my_list_2 = lapply(my_list_2, function(x) x[!is.na(x)])

The code seems to have run - but I am not sure if I am doing this correctly.

Can someone please show me how to do this?

Thanks!

FJCC · March 17, 2023, 3:09pm

I may be missing the point. Does this give you what you want?

my_data = data.frame(id = 1:100, var = rnorm(100,100,100))
BOUNDS <- seq(10, 95, by = 5)
my_list_1 = list()
my_list_2 = list()

for(i in seq_along(BOUNDS)) {
  my_list_1[[i]] = my_data[1:BOUNDS[i],]
  my_list_2[[i]] = my_data[(BOUNDS[i]+1):(BOUNDS[i]+5),]
}
my_list_1[[1]]
#>    id       var
#> 1   1  31.89407
#> 2   2 -29.92905
#> 3   3 194.91020
#> 4   4 106.23534
#> 5   5 114.52101
#> 6   6 109.36323
#> 7   7 162.30542
#> 8   8 149.97832
#> 9   9 -14.34110
#> 10 10 100.36528
my_list_2[[1]]
#>    id       var
#> 11 11 132.58429
#> 12 12  81.36846
#> 13 13 133.50496
#> 14 14  14.20145
#> 15 15 122.92964

^{Created on 2023-03-17 with reprex v2.0.2}

system · March 24, 2023, 3:10pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.