Hi !
I am trying to do a PCA on a dataset with genotypic and environmental info. I have about 30 phenotypic descriptors and all of them have hundreds of NA values.
I try to do my PCA with factoextra, but it does not accept missing values.
prcomp(colQ, scale = T)
Error in svd(x, nu = 0, nv = k) : valeurs infinies ou manquantes dans 'x'
I tried to impute them with the package missMDA ;
nb <- estim_ncpPCA(data, ncp.max=5)
comp <- imputePCA(data,
ncp=nb$ncp,
scale=TRUE)
Problem : when I want to use the imput values, I have an error message:
prcomp(comp, scale = T)
Error in prcomp(as.numeric(comp), scale = T) :
'list' object cannot be coerced to type 'double'
Because comp if for some reason a list of 2 doubles of 1025 x 9 values
(My data is 1025 x 9 length).
One element of comp is “CompleteObs”, the other is “fittedX”
I tried to imput only one of the vectors in the prcomp() function and the following steps for a PCA:
pca_fitted <- prcomp(comp$fittedX, scale = T)
pca_comp <- prcomp(comp$completeObs, scale = T)
summary(pca_fitted)
summary(pca_comp)
fviz_eig(pca_fitted)
fviz_eig(pca_comp)
But the results are completely different !
> summary(pca_fitted)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 2.5236 1.6221 1.197e-14 1.789e-15 1.39e-15 6.524e-16 4.264e-16
Proportion of Variance 0.7076 0.2924 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
Cumulative Proportion 0.7076 1.0000 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
PC8 PC9
Standard deviation 3.133e-16 1.585e-16
Proportion of Variance 0.000e+00 0.000e+00
Cumulative Proportion 1.000e+00 1.000e+00
> summary(pca_comp)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
Standard deviation 1.518 1.2748 1.0158 0.94495 0.93199 0.86790 0.7672 0.69414 0.67380
Proportion of Variance 0.256 0.1806 0.1147 0.09921 0.09651 0.08369 0.0654 0.05354 0.05045
Cumulative Proportion 0.256 0.4365 0.5512 0.65041 0.74692 0.83062 0.8960 0.94955 1.00000
The following graphics and analysis are also very different from each other.
Do you know if is it okay to use FittedX or CompletObs for this analysis?
And do you know the difference between them?
Thank you very much for you help!
Del