The code you posted will not run because nrow_i is not defined. I set it to zero at the top of the for loop and found that the while loop only executes once for every value of i. I showed this by added a counter variable j that increments with each iteration of the while loop. In the following reprex, you can see that j is always 1, ctotal_i is always > 15 and nrow_i is always > 5.
From your description of the task, it seems the tests of the while loop should be c1_i + c2_i + c3_i > 15 || nrow_i > five, so that the loop will keep running until ctotal_i < 15 and nrow_i < 5. However, it might take a long time to meet both conditions. In 100 iterations, the minimum ctotal_i was 16.9 and the minimum nrow_i was 8. The chances of getting both below the thresholds seems very low.
Thank you so much for your answer! I had 2 questions:
Question 1: When I ran your code, I noticed that in the results nrow_i is almost always larger than 5. Why is this happening? I would have thought that this LOOP would take a very long time to run in order to satisfy both conditions within the WHILE LOOP. Do you know why in the results nrow_i is always appearing as larger than 5? Is it possible to write the conditions such that it becomes less than 5?
Question 2: What is the purpose of sapply(list_results,function(DF) DF$j)? I don't see a "DF" object defined anywhere in the code, yet this code still runs. What are you trying to accomplish using "sapply" and "DF"?
I do not know enough about the mathematics of combinations to explain why the values of nrow_i are what they are. I do not even have a good intuition about this kind of problem. You are sampling 20% of the population three times and asking that less than 3.3% (5/150) of the total population be sampled more than once. I could not have told you beforehand even roughly how likely that is but running
will show you that typical values of nrow_i are in the mid teens.
2. The code sapply(list_results,function(DF) DF$j) simply displays all the values of j over the 100 iterations of the for loop. The object list_results is a list of data frames. sapply() iterates over that list and passes each data frame to the little function I wrote function(DF) DF$j. I named the argument of the function DF and it receives each element of list_results, that is each data frame. The function returns the j column of the data frame and sapply builds a vector from those values. The purpose of those lines with sapply was to show that j is always 1, ctotal_i is never < 15 and nrow_i is never < 5.
I would have thought that while(c1_i + c2_i + c3_i < 15 && nrow_i > five ) would keep running (even if it runs for infinite time) until all values of "nrow_i < 5". Is this correct?
In short - suppose I didn't care how long the R code takes to run - what kind of condition would I have to write to ENSURE that this WHILE LOOP ONLY outputs results where nrow_i<5?
If you change the condition of the while loop to while(c1_i + c2_i + c3_i < 15 && nrow_i > five ), the sub condition c1_i + c2_i + c3_i < 15 will almost certainly be FALSE after the first iteration and the loop will only iterate once. The cx_i variables are the average of 30 samples of the Sepal.Length. Since Sepal.Length of the whole iris data set averages about 5.8, the average of 30 samples will also be very close to 5.8 and the sum of three of those will never be less than 15. If you remove the condition c1_i + c2_i + c3_i < 15 entirely, you can test how long it takes to get samples with nrow_i < 5.
if you change mean_target down from 18 to 15 as per your initial statements, this code will run and run, this is set to 18, so that it completes in a reasonable time. similarly sample_size_at_each_step from 20 to 30