I have this data frame in R:
my_data = data.frame(id = 1:100, var = rnorm(100,100,100))
I want to make two lists (my_list_1, my_list_2) that store data from this frame like this:
my_list_1 = list()
my_list_2 = list()
my_list_1[1] = my_data[1:10,]
my_list_2[1] = my_data[11:15,]
my_list_1[2] = my_data[1:15,]
my_list_2[2] = my_data[16:20,]
my_list_1[3] = my_data[1:20,]
my_list_2[3] = my_data[21:25,]
# etc.
Until all rows from my_data have been exhausted. As we can see:
-
my_list_1
always starts from the first row ofmy_data
and keeps adding on 5 rows -
my_list_2
starts from where it left off and only takes 4 rows at a time
I tried to solve the problem like this using the ceiling function in R:
# set the size of the chunks
chunk_size_1 = 10
chunk_size_2 = 5
# loop through the rows of my_data, adding chunks to the lists
for (i in 1:nrow(my_data)) {
# calculate the current chunk numbers
chunk_num_1 = ceiling(i / chunk_size_1)
chunk_num_2 = ceiling((i + (chunk_size_2 - 1)) / chunk_size_2)
# add the current row to the appropriate chunk
if (!exists(paste0("my_list_1[", chunk_num_1, "]"))) {
my_list_1[[chunk_num_1]] = my_data[1:i,]
} else {
my_list_1[[chunk_num_1]] = my_data[1:(chunk_num_1 * chunk_size_1),]
}
if (!exists(paste0("my_list_2[", chunk_num_2, "]"))) {
my_list_2[[chunk_num_2]] = my_data[(i+1):(i+chunk_size_2),]
} else {
my_list_2[[chunk_num_2]] = my_data[((chunk_num_2-1) * chunk_size_2 + 1):(chunk_num_2 * chunk_size_2),]
}
}
# remove NAs
my_list_1 = lapply(my_list_1, function(x) x[!is.na(x)])
my_list_2 = lapply(my_list_2, function(x) x[!is.na(x)])
The code seems to have run - but I am not sure if I am doing this correctly.
Can someone please show me how to do this?
Thanks!