Create loop for in R

laplanca · March 25, 2022, 2:24pm

I want to program a loop for with if but I don't know how to do it in R. Something like that :

for i in 1:len(base)
     for j in 1:len(base)
           if id[i]==id[j] and diffdate>12
           then class[j] == "verif"
           else if id[i]==id[j] and diffdate<=12
           then class[j] == " "

I don't know if it's possible.Thank you for your time.

FJCC · March 25, 2022, 2:33pm

I hope I did not make any mistakes.

for (i in 1:len(base)) {
  for (j in 1:len(base)) {
    if (id[i]==id[j] && diffdate>12) {
      class[j] == "verif"
    } else if (id[i]==id[j] && diffdate<=12) {
        class[j] == " "
    }
  }
}

laplanca · March 25, 2022, 2:38pm

It seems good! But I have this error :

Error in id[i]: object of type 'closure' not indexable

FJCC · March 25, 2022, 2:43pm

How have you defined id? If you run

class(id)

what do you get? There is a function named id in the dplyr package and if you have not defined a variable with the same name, then id[i] will apply to the function and you will get that error.

nirgrahamuk · March 25, 2022, 2:46pm

I agree, and it's also true for class
: )

laplanca · March 25, 2022, 2:56pm

At the end I have this code :

  for (i in 1:15000) {
    for (j in 1:15000) {
      if (I_ID[i]==I_ID[j] && (INCIDENCE_DATE[j]-NCIDENCE_DATE[i])>12) { class_S[j] == "verif"
      } 
      else if (I_ID[i]==I_ID[j] && (NCIDENCE_DATE[j]-INCIDENCE_DATE[i])<=12) {class_S[j] == " "
      }
    }
  }

and this error

Error in if (I_ID[i] == I_ID[j] && INCIDENCE_DATE[j] - INCIDENCE_DATE[i] > :
  missing value where TRUE / FALSE is required

nirgrahamuk · March 25, 2022, 3:10pm

if your comparisons should involve an NA (missing value) what effect would you wish that to have ? would it make the comparison true, or false ?
Most often you would prefer it to be FALSE , if so do this is an option

for (i in 1:15000) {
  for (j in 1:15000) {
    comp1 <- I_ID[i] == I_ID[j] && (INCIDENCE_DATE[j] - NCIDENCE_DATE[i]) > 12
    if (is.na(comp1)) {comp1 <- FALSE}
    if (comp1) {
      class_S[j] == "verif"
    } else {
      comp2 <- I_ID[i] == I_ID[j] && (NCIDENCE_DATE[j] - INCIDENCE_DATE[i]) <= 12
      if (is.na(comp2)) {comp2 <- FALSE}
      if (comp2) {
        class_S[j] == " "
      }
    }
  }
}

laplanca · March 28, 2022, 8:44am

Thank you so much it works! But I think that it doesn't do what i need.

Indeed, I have a database with people who made two purchases at two differents dates.
There is one row for one purchase by client. The date of purchase is INCIDENCE_DATE(format %Y%m%d), and I want to classify for people who buy two times, in fonction of the difference of time in months between the two date. I want to classify in a new column Class_S, only the most recent order.

For example with this data :
id/name/incidence_date
12/patrick/20161211
12/patrick/20181211
32/madison/20150211
45/james/20140215
45/james/20140515

I want to have this (diffDateMonths is only indicative)
id/name/incidence_date/diffDateMonths/Class_S
12/patrick/20161211/24/
12/patrick/20181211/24/VERIF
32/madison/20150211/
45/james/20140215/3/
45/james/20140515/3/

As my date to compare are in different rows but in the same colum, that's why I think about made a loop for to do this. Maybe this is not the best idea?