I am relatively new to R. I have created code to review a dataframe and identify rows of data based on specific conditions, and mark those rows with a 1 and the column "check". The code works exactly how I have intended it to with the test data. My problem is the real dataset is 1 million plus rows, and while it works, it is way too slow. I would appreciate help in improving the efficiency of this code.
#create test data
alarm <- c(0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0)
setpoint <- c(10,10,10,10,10,10,10,10,8,8,9,8,8,10,10,10,10,10,10,10,10,10,10,10,8,10,10,8,10,10,10)
temp <- data.frame(alarm, setpoint)
#create a new column to capture if there is any changes to setpoint after any alarm
temp$check <- ""
#review everyrow in dataframe
for(i in 1:nrow(temp)){
cat(round(i/nrow(temp)*100,2),"% \r") # prints the percentage complete in realtime.
if(temp$alarm[i]==1 && temp$setpoint[i] >= 10){
#for when alarm has occurred and the setpoint is 10 or above review the next 5 rows
for(j in 0:5){
if(temp$setpoint[i] != temp$setpoint[i+j]){
#for when there has been a change in the setpoint
for(q in 0:10){
if(temp$setpoint[i] != temp$setpoint[i+q]){
temp$check[i+q]<-'1'
if(temp$setpoint[i+q] != (temp$setpoint[i+q+1])){break}
}
}
}
}
}
}
> print(temp)
alarm setpoint check
1 0 10
2 0 10
3 0 10
4 0 10
5 0 10
6 0 10
7 1 10
8 1 10
9 0 8 1
10 0 8 1
11 0 9
12 0 8
13 0 8
14 0 10
15 0 10
16 0 10
17 1 10
18 0 10
19 0 10
20 0 10
21 0 10
22 1 10
23 0 10
24 0 10
25 0 8 1
26 0 10
27 0 10
28 0 8
29 0 10
30 0 10
31 0 10