Loop within loop to subset indexes. Each subset should have one more index than before

Tim_George · October 23, 2024, 1:10pm

My data looks like this
|id | time| y | x1|
|1 | 1 | 2312 | 34345|
|1 | 2 | 2343 | 234566|
|1 | 3 | 5654 | 4532234|
|2 | 1 | 4234 | 453256|
|2 | 2 | 7647 | 8653|
|2 | 3 | 3457 | 123245|
|3 | 1 | 2453 | 235454|
|3 | 2 | 7654 | 3345675|
|3 | 3 | 7653 | 2542665|

I want a loop that takes different combinations of id (identifier for cities) and time (identifier for years) - eg: take two cities for 2 years and 3 years, then take 3 cities for two years and three years. I want to estimate an interaction matrices for which I have a function for these different combinations of number of cities and years. How can I construct the loop?

mikecrobp · October 23, 2024, 1:27pm

2 possibilities I think:

Create 2 levels of loop, one looping through unique id values, the other through unique values of time
or use the function crossing, eg crossing(unique(df$id), unique(df$time)) to generate a table of possibilities and then loop through that

nirgrahamuk · October 23, 2024, 1:47pm

I like approach 2 which upfront decides what to do, as that can be sense checked, i.e. that the total volume of possibilities is sensible.

nirgrahamuk · October 23, 2024, 4:13pm

as a tip for you ; if you want every combination of ID , for which your example has 3 unique ids' and there are therefore 7 possible combinations. you can find that like so :

library(tidyverse)

d1 <- read_delim(file = "|id|time|y|x1|
|1|1|2312|34345|
|1|2|2343|234566|
|1|3|5654|4532234|
|2|1|4234|453256|
|2|2|7647|8653|
|2|3|3457|123245|
|3|1|2453|235454|
|3|2|7654|3345675|
|3|3|7653|2542665|", delim = "|") |> select(where(is.numeric))

d1

(uid <- unique(d1$id))

(id_combinations <- map(
  seq_along(uid),
  \(x)combn(x = uid, m = x, simplify = FALSE)
) |> flatten())

Tim_George · October 24, 2024, 9:07am

unique_ids <- unique(main_data$id)

results <- list()  


# Loop over increasing number of IDs (cities)
for (i in 5:length(unique_ids)) {
  
  # Subset the IDs for the current iteration
  selected_ids <- unique_ids[1:i]  # Take the first i IDs
  
  # Filter the main data based on the selected cities
  data <- main_data[main_data$id %in% selected_ids, c("id", "time", "y", "x1", "x2", "x3") ]
  
    # Run the recoverNetwork function on the subset data
    rn <- recoverNetwork(data, lambda = c(0.10, 0.10, 0.10))
    
    # Extract the result
    W_matrix <- rn$unpenalisedgmm$W
    
    results[[i]] <- W_matrix
    
  }

so this is what i am trying to do. Each iteration has one more index than the previous iteration. However I encounter this error in indexing:

Error in ku_format_slice(key$row, nrow) :
Index is out of bounds for axis with size 10

unique(main_data$id) is 10 where each id has data for 11 years.

Could you possibly know why I am encountering this error in subsetting? i guess there is a logical oversight

Tim_George · October 24, 2024, 9:14am

unique_ids <- unique(main_data$id)

results <- list()  


# Loop over increasing number of IDs (cities)
for (i in 5:length(unique_ids)) {
  
  # Subset the IDs for the current iteration
  selected_ids <- unique_ids[1:i]  # Take the first i IDs
  
  # Filter the main data based on the selected cities
  data <- main_data[main_data$id %in% selected_ids, c("id", "time", "y", "x1", "x2", "x3") ]
  
    # Run the recoverNetwork function on the subset data
    rn <- recoverNetwork(data, lambda = c(0.10, 0.10, 0.10))
    
    # Extract the result
    W_matrix <- rn$unpenalisedgmm$W
    
    results[[i]] <- W_matrix
    
  }

Thank you.
But this is what i am trying to do. Each iteration has one more index than the previous iteration. However I encounter this error in indexing:

Error in ku_format_slice(key$row, nrow) :
Index is out of bounds for axis with size 10

unique(main_data$id) is 10 where each id has data for 11 years.

Could you possibly know why I am encountering this error in subsetting? i guess there is a logical oversight

nirgrahamuk · October 24, 2024, 3:56pm

Your approach works on this example data ...
So can you provide an example of data where it fails ?

library(recoverNetwork)

main_data <-  gendata(setting=1,1)
main_data <- main_data[main_data$id <=7] # make it smaller because I get bored waiting for a long time

unique_ids <- unique(main_data$id)
results <- list()  

# Loop over increasing number of IDs (cities)
for (i in 5:length(unique_ids)) {
  
  # Subset the IDs for the current iteration
  selected_ids <- unique_ids[1:i]  # Take the first i IDs
  
  # Filter the main data based on the selected cities
  data <- main_data[main_data$id %in% selected_ids, c("id", "time", "y", "x1") ]
  
  # Run the recoverNetwork function on the subset data
  rn <- recoverNetwork(data, lambda = c(0.10, 0.10, 0.10))
  
  # Extract the result
 W_matrix <- rn$unpenalisedgmm$W
  

  results[[i]] <-  W_matrix
  print(paste0("Done i=",i))
}

Tim_George · October 28, 2024, 10:00am

It doesn't seem to work for this data
data <- gendata(setting=15,seed=1)

I get the error below . Do you know why it worked for main_data <- gendata(setting=1,1) and not the one above? Both have only one x variable (x1)

Initial conditions
Elastic Net
Error in glmnet::glmnet(ZX, ZY, lambda = lambda, alpha = alpha, penalty.factor = pen, :
x should be a matrix with 2 or more columns
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters

system · January 26, 2025, 10:00am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.