I am working on this data mining project and came upon some troubles. For multivariate analysis, I have been using this function:
res.cox <- coxph(Surv(OS, Status) ~ Sex + RANDOMGENE, mydata)
summary(res.cox)
The variables listed out are the headers of the columns found in excel. For the RANDOM GENE variable, I would put specific genetic mutations that I would want to analyze for greater survival. This function works perfectly. However, since I have many genes to analyze, I thought it would serve me better to try to automate the process. Therefore, I decided it would be better to use a for loop but always got the error "variables length differ". Below is the example code I still need to fix. Basically, I need this for loop to "feed" the genes into the coxph function but again could not figure out why this would not work.
Here is some dummy data:
mydata <- read.table(header=T, text="ATRX IDH1 VEGF Sex survival vital.status
2 1 1 2 1.419178082 2
2 1 1 1 5 1
2 1 1 2 1.082191781 2
1 1 1 1 0.038356164 1
2 1 2 2 0.77260274 2
1 1 2 2 2.336986301 1
2 1 2 1 1.271232877 1")
Here is the code that I still need to work on.
Code:
genes <- names(mydata[1:3])
for (i in 1:length(genes))
{
print(coxph(Surv(survival, vital.status) ~ Sex + genes[i], mydata))
}
This is the error that I get:
Error in model.frame.default(formula = Surv(survival, vital.status) ~ :
variable lengths differ (found for 'genes[i]')