I have an issue when trying to reconstruct my original observations from (weighted) rotated PC scores. The issue is that I weighted the rotated PC's by their variance before using the predict.PCAmix()
function and I am getting the following error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"
I know from the documentation that the function takes as argument an object of class PCAmix
obtained with the function PCAmix
or PCArot
. The issue is that I weight the PCArot object by its variance before the prediction. Any idea on how can I use the predict
function to reconstruct my original observations using the weighted rotated PC's?
The general steps I am doing are:
- perform an initial PCA with all the columns.
- Find the PC's with eigenvalue greater than 1.
- re-run the PCA and use only the number of PC's found in step 2.
- Perform varimax rotation to the PC's which have eigenvalue greater than 1
- Weight the rotated PC's from step 4 by their variance.
- Use the weighted PC's from step 5 to reconstruct the original observations.
Here is the code:
library(PCAmixdata)
# 1 splitting the data into quantitative and qualitative variables
split_data <- splitmix(df_sens_pca)
X_quanti <- split_data$X.quanti
X_quali <- split_data$X.quali
# 2 initial PCA
pca_res <- PCAmix(X.quanti = X_quanti, X.quali = X_quali, ndim = ncol(df_sens_pca), graph = FALSE)
# 3 finding PCs with eigenvalue greater than 1
eigenvalues <- pca_res$eig[, 1]
pc_greater_than_1 <- which(eigenvalues > 1)
num_pc <- length(pc_greater_than_1)
# 4 re-running PCA with the number of PCs identified in step 3
pca_res_reduced <- PCAmix(X.quanti = X_quanti, X.quali = X_quali, ndim = num_pc, graph = FALSE)
# 5 performing varimax rotation
pca_rot_res <- PCArot(pca_res_reduced, dim = num_pc)
# 6 weighting the rotated PCs by their variance (eigenvalues)
variance_explained <- pca_rot_res$eig[, 1]
rotated_scores <- pca_rot_res$ind$coord
weighted_rotated_scores <- rotated_scores * sqrt(variance_explained)
# 7 Reconstruction of the original observations
???????
And a small sample dataset:
df_sens_pca <- structure(list(pd = c(0.0230, 0.0588, 0.1775, 0.0502, 0.0726, 0.0812, 0.1395, 0.0651, 0.0603, 0.0165, 0.0749, 0.0394, 0.2585, 0.0336, 0.0442, 0.0589, 0.0195, 0.0635, 0.0740, 0.0113),
elderly = c(0.2117, 0.1773, 0.1864, 0.0491, 0.2615, 0.095, 0.0838, 0.0931, 0.1091, 0.1753, 0.1144, 0.0944, 0.1064, 0.2325, 0.0974, 0.1895, 0.1036, 0.1883, 0.2167, 0.0822),
poverty = c(0.4464, 0.4094, 0.4525, 0.5828, 0.4938, 0.6502, 0.6254, 0.6455, 0.4346, 0.4072, 0.3746, 0.5515, 0.5653, 0.5078, 0.6452, 0.544, 0.6096, 0.5034, 0.4842, 0.6556),
agbh = c(1.7241, 3.8453, 4.6492, 2.6597, 2.6960, 1.8035, 3.9032, 2.0821, 1.2062, 0.7712, 2.1968, 0.9012, 6.5590, 1.1397, 1.8335,2.7920, 0.9466, 2.5760, 3.0563, 1.2654),
disability = c(0.1309, 0.1422, 0.126, 0.112, 0.1611, 0.1467, 0.1479, 0.1634, 0.1325, 0.1177, 0.1123, 0.1221, 0.1441, 0.1444, 0.1578, 0.1605, 0.1594, 0.1238, 0.1503, 0.1683),
unemployment = c(0.0261, 0.0329, 0.0317, 0.068, 0.0267, 0.0566, 0.0725, 0.0913, 0.0447, 0.0326, 0.0168, 0.0584, 0.096, 0.0221, 0.0666, 0.0292, 0.0506, 0.0336, 0.0262, 0.0575),
imp = c(0.3833, 0.5673, 0.5632, 0.4853, 0.5923, 0.4189, 0.8105, 0.4935, 0.2671, 0.1713, 0.4942, 0.3173, 0.6797, 0.3319, 0.4103, 0.5661, 0.2314, 0.4528, 0.7786, 0.3013),
cs_dist = c(0.1864, 0.1132, 0.0268, 0.0393, 0.1405, 0.0541, 0.0132, 0.0149, 0.0861, 0.2019, 0.0125, 0.1933, 0.0632, 0.0494, 0.4077, 0.0307, 0.1628, 0.1418, 0.0315, 0.1689),
groupname = c("Transport", "Transport", "Education and Health", "Transport", "Retail", "Public Infrastructure", "Commercial Services", "Retail", "Commercial Services", "Commercial Services", "Accommodation, Eating and Drinking", "Public Infrastructure", "Public Infrastructure", "Transport", "Manufacturing and Production", "Public Infrastructure", "Commercial Services", "Transport", "Public Infrastructure", "Commercial Services")),
row.names = c(NA, -20L),
class = "data.frame",
na.action = structure(c(`2143` = 2143L, `2145` = 2145L, `2147` = 2147L, `2149` = 2149L, `2150` = 2150L, `2276` = 2276L, `2280` = 2280L, `2402` = 2402L, `2404` = 2404L, `4518` = 4518L, `4532` = 4532L, `4885` = 4885L, `4896` = 4896L, `4897` = 4897L, `4914` = 4914L),
class = "omit"))
I know it's a good practice to scale and center the data before PCA, but at this point all I want is to figure out how to use the predictPCAmix()
function.
R 4.4.1, PCAmixdata 3.1, Windows 11.