Combine outputs of a for loop in R

noobie · November 16, 2020, 12:09pm

I have created a function which completes a calculation based on data from two data frames for an individual. I want to complete that function for each individual and combine all the outputs in a data frame and export to .csv
Currently the output .csv only has data for person 34, none of the others. Why is that?

library(dplyr)
library(lubridate)
library(date)

screen_function = function(x){
 # Select each person and get necessary inputs
   data = data%>%filter(ID == x)
  demogs = demogs %>% filter (P_ID == x)
  data$Age =  demogs$Age
 
   data$result = data$test * data$Age
   
   data$Date = as.Date(data$Date,format='%d/%m/%Y')
 # only include tests from most recent 12-24 months and only proceed if test in last 12 months  
   Recent_12m  = data %>% filter(between(Date, today() - years(1), today()))   
   Recent_24m  = data %>% filter(between(Date, today() - years(2), today()))   
   
   if ((nrow(Recent_12m)) > 0) {           
   
     data = rbind(Recent_12m,Recent_24m) 
     
     
     Recent_12m$min_date = min(Recent_12m$Date) 
     Recent_12m$Date = as.Date(Recent_12m$Date,format='%d/%m/%Y') 
     Recent_12m$min_date = as.Date(Recent_12m$min_date,format='%d/%m/%Y') 
     
     Recent_24m$min_date = min(Recent_24m$Date)
     Recent_24m$Date = as.Date(Recent_24m$Date,format='%d/%m/%Y') 
     Recent_24m$min_date = as.Date(Recent_24m$min_date,format='%d/%m/%Y') 
     
    
     Recent_12m$Period = interval(Recent_12m$min_date, Recent_12m$Date) 
     Recent_12m$Years = as.numeric(Recent_12m$Period, unit = "years")   
     Recent_24m$Period = interval(Recent_24m$min_date, Recent_24m$Date)  
     Recent_24m$Years = as.numeric(Recent_24m$Period, unit = "years")    
     
     # Latest result
     Last = filter(Recent_12m, (Recent_12m$Date == max(Date)))
     
     # linear regression model
     Reg_12month <- lm(result ~ Years, data=Recent_12m) 
     Reg_24month <- lm(result ~ Years, data=Recent_24m) 
     
     
     info = c(x, round(Last$result, digits=1), round(Reg_12month$coefficients["Years"], digits = 1), round(Reg_24month$coefficients["Years"], digits = 1)) 
     newdf = data.frame(matrix(0, ncol = 4))
      colnames(newdf) = c("ID", "Latest result",    "Trend 12month",    "Trend 24 month")
      newdf= rbind(newdf, info)
      write.csv(newdf, "filepath.csv")
     }
}


Date= sample(seq(as.Date('2019/11/01'), as.Date('2020/11/01'), by="day"), 12)
ids= c(12,12,12,450,450,450,1,1,1,34,34,34)
test= rnorm(12, mean=150, sd=60)

data= data.frame(ids, Date, test)
colnames(data)= c("ID", "Date", "test")

P_ID = c(1,12,34,450)
Age = c(50,45,60,72)
demogs =  data.frame(P_ID, Age)


persons = unique(data$ID)

for(ID in persons){
  
  screen_function(paste("", ID,"", sep=""))
}

^{Created on 2020-11-16 by the reprex package (v0.3.0)}

nirgrahamuk · November 16, 2020, 12:17pm

the default behaviour is to overwrite any existing file.
you need to set append=TRUE as a parameter.
But then you will need to add some additional logic to stop endlessly extending the csv for every time you run the script. maybe before you call your function multiple times, check if the file exists and remove it if it does

noobie · November 16, 2020, 12:25pm

Thanks for the reply.

I've tried adding append= TRUE.
I get 4 warnings (one for each unique ID, I guess):

1: In write.csv(newdf, "filepath.csv", append = TRUE) :
  attempt to set 'append' ignored

nirgrahamuk · November 16, 2020, 12:36pm

ok, I was a bit too fast to respond.
write.csv doesnt support appending, you would need to use write.table instead.

then you also have to think about how you handle the fact that only the first write should put the headers down, and subsequent writes should not

mytf <- tempfile()
# when you make the first write to the file put out column headers 
write.table(iris[0,],file = mytf,sep = ",")

#subsequent appends without headers/column names
write.table(iris[1,],file = mytf,append=TRUE,sep = ",",col.names = FALSE)
write.table(iris[150,],file = mytf,append=TRUE,sep = ",",col.names = FALSE)

#read it back two ways
cat(paste0(readLines(con=mytf),"\n"))
(read.csv(file = mytf))

noobie · November 18, 2020, 10:07am

Thanks again.
Eventually I decided to use a pre-made .csv instead of creating a new dataframe. It works, I just need to make sure the .csv is empty aside from a row of the correct number of coulmns

newdf= read.csv(file = "filepath.csv")
info =  c(x, round(Last$result, digits=1), round(Reg_12month$coefficients["Years"], digits = 1), 
                                        round(Reg_24month$coefficients["Years"], digits = 1)) 
newdf= rbind(Summary, patient_info)
colnames(newdf) = c("ID", "Latest result",    "Trend 12month",    "Trend 24 month")
newdf= distinct(newdf, ID, .keep_all = TRUE)
write.csv(Summary, "filepath.csv", row.names = FALSE)}}

system · December 9, 2020, 10:07am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.