I have a list of sample names each repeated 3 times, identified by "_R1" or "_R2" or "_R3".
How can I split them such that:
- 90% of samples go to Group1, the rest go to Group2
and - There is no overlap of the 'original' sample names between the two groups? (aka avoid a scenario like Group1 contains samplex_R1 and samplex_R2 - and Group2 has samplex_R3)?
Minimal RePrex:
#Make a dataframe with a list of names in triplicate, idenifited by _R1/2/3
R1 <- paste0(rownames(mtcars),"_R1")
R2 <- paste0(rownames(mtcars),"_R2")
R3 <- paste0(rownames(mtcars),"_R3")
mydf <- data.frame("samples" = Reduce(union, c(R1,R2,R3)))
#Randomly shuffle the rows to simulate my 'real' data
mydf <- data.frame("samples"=mydf[sample(1:nrow(mydf)),])
#########################################################################
#----Separate 90% of the samples into group1, put the rest in group2----#
#########################################################################
#First get the numbers of samples going into each group
group1_number <- ceiling(0.9 * nrow(mydf)) #90%
group2_number <- (nrow(mydf)) - group1_number #the rest
#Get the names that will go into group1/2
group1_names <- mydf[1:group1_number,c("samples")]
group2_names <- mydf[(group1_number+1):nrow(mydf),c("samples")]
#Place samples in group1
group1 <- data.frame("samples"=mydf[mydf$samples %in% group1_names,])
group2 <- data.frame("samples"=mydf[mydf$samples %in% group2_names,])
#How do I avoid overlap of 'base' sample names in these groups?