in the second method, I use aggregate to calculate mean for all genes:
resume <- aggregate(data=dfX, .~status, mean)
resume$gene1
[1] 0.05448438 0.05161707 for group 1 and 2 respectively
As you can see, these are two different average results !!
I check and I found that the first method give the right result !
Could you explain me why the function aggregate don't give the good results or if I made a mistake with this function
I remember something about aggregate() when there are NAs, like dropping an entire row if any of the values in the row are NA. Of course, at 66 my memory is not perfect. What happens when you do
aggregate(data = dfX, gene1 ~ status, mean)
instead of including all variables with . ~ status and then selecting gene1?
here is the file.
I reduced the table to 50 variables. It's funny because the result with aggregate is different than with the complete table and always different with tapply...