Factor analysis on a split data frame using lapply command

JTuttle · February 28, 2020, 2:50am

Hi everyone,

I am a relatively new R user, so I apologize if this question has been asked and answered before.

I am trying to run a factor analysis on a dataframe that I split into 5 groups using the "split" command in base R.

I am using the 'psych' package to conduct the factor analysis, and the "lapply" command to run the factor analysis on all 5 groups simultaneously.

My question is, is it possible to specify a weighting variable for the factor analysis?

I know that the factor analysis command in 'psych' (fa) accepts a weight variable, and I have run a weighted factor analysis successfully using that. However, the fact that my data are split into 5 different groups has complicated the factor analysis command. Here is an example of what I am trying to do:

output3_efa <- lapply(variable_list[3:15], function(x){fa(x, rotate = "promax", nfactors = 2, fm = "pa", weight = variable_list$WEIGHT)})

Of course, this command returns an error stating that the object WEIGHT cannot be found. The WEIGHT object exists in each group without missing values.

I think I need to find a way to specify the WEIGHT variable doing something like this:

output3_efa <- lapply(variable_list[3:15], function(x){fa(x, rotate = "promax", nfactors = 2, fm = "pa", weight = variable_list$'ALL'$WEIGHT)})

where "variable_list$'ALL'$WEIGHT" would specify the WEIGHT variable in each of the 5 groups I'm simultaneously analyzing...this does not work, though. I would appreciate any solutions you might have!

technocrat · February 28, 2020, 3:19am

Hi, and welcome!

Questions like this are difficult to answer in the abstract, but with data and code can attract useful responses. See the FAQ: What's a reproducible example (`reprex`) and how do I do one?

Data is critical. If the data set is small, the reprex can include an assignment to the results of

the_data <- dput(my_data)

If the data are too large, a subset with

the_data <- head(my_data,50)
the_data <- tail(my_data,50)
the_data <- sample(my_data,50, replace = FALSE)

If my_data is confidential and can't be easily masked, try one of the many built-in data sets in R, especially any that come with the packages that you are using. Also, consider whether you can torture a universal data set such as mtcars into a representative structure that illustrates your question.

And, of course, please include all package dependencies in the reprex.

JTuttle · February 29, 2020, 1:13pm

Hello technocrat,

Thanks for your response! I am providing additional information and some data. The data are a random sample of 200 cases from the original data set. The original data set is too large to share. I have uploaded a .csv of the data to Google Drive - you can access the data here:

https://drive.google.com/drive/u/0/folders/1Uk7g0LtjwQpD8DjRn9yap6-TZATHEOLA

To summarize the issue again, I am trying to run a weighted factor analysis, using the psych and GPArotation packages, on a split data frame. I must nest the "fa" command within an "lapply" command to run the fa on each group in the split data frame.

My issue is that I cannot figure out how to properly specify a weight when nesting the "fa" command within "lapply". Here is an example of the specific code I have written:

#Opening data

mydata = read.csv("PATH TO NeoliberalismData.csv") #Please specify the path on your local drive.

#Loading required libraries

library(psych)
library(GPArotation)

#Splitting by year

split_list1 <- split(mydata, mydata$YEAR)

#Removing YEAR from the split

variable_list1 <-lapply(split_list1, function(x){x[,-1]})

#Carrying out exploratory factor analysis on all groups in the split, using "lapply"

output3_efa <- lapply(variable_list1, function(x){fa(x[2:14], rotate = "promax", nfactors = 2, fm = "pa", weight = variable_list1$WTSSALL)})

#My issue with the above lapply (fa) is that I do not know how to specify a weight when dealing with a split dataframe. The weight, "WTSSALL", exists in each group of the split.
#However, the weight is not applied properly in the above (fa) statement. The results are not at all different in comparison to unweighted results.

technocrat · February 29, 2020, 10:34pm

Caught the data, thanks. So, we are splitting by year, rather than grouping by year? I'll look at the fa function as soon as I get back from my errand. (If you don't hear back in a reasonable time, check CNN; I live 5 miles from where the first COVID-19 patient died.)

technocrat · March 1, 2020, 3:04am

OK, if this is the result you're after for a single year, I'll suggest a workflow for doing it for all variables at the same time (do you want the results in a single object or separately?)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(GPArotation)
library(psych)
library(readr)

# preprocess data from https://drive.google.com/drive/u/0/folders/1Uk7g0LtjwQpD8DjRn9yap6-TZATHEOLA

# saved to NeoliberalismData.csv (fully qualified path for reprex)
mydata <- read_csv("~/Desktop/NeoliberalismData.csv")
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   YEAR = col_double(),
#>   WTSSALL = col_double(),
#>   CUTGOVT1 = col_double(),
#>   LESSREG1 = col_double(),
#>   PRICECON = col_double(),
#>   AIDINDUS = col_double(),
#>   HLPHITEC = col_double(),
#>   MAKEJOBS = col_double(),
#>   SAVEJOBS = col_double(),
#>   JOBSALL = col_double(),
#>   CUTHOURS = col_double(),
#>   HLTHCARE = col_double(),
#>   AIDOLD = col_double(),
#>   AIDUNEMP = col_double(),
#>   EQUALIZE = col_double(),
#>   YEAR_METRIC = col_double()
#> )

# create test subset, discarding X1, YEAR and YEAR_METRIC (which is constant)
mydata %>% filter(YEAR == 2016) %>% select(-X1,-YEAR,-YEAR_METRIC) -> sub2016

# extract weights 
sub2016 %>% select(WTSSALL) %>% .$WTSSALL -> wts2016

# discard WTSSALL
sub2016 %>% select(-WTSSALL) -> sub2016

# create covariance matrix
sub2016 %>% cov() -> cov2016

# create an fa object
fa2016 <- fa(cov2016, fm = "wls", weights = wts2016)

# show the result
fa2016
#> Factor Analysis using method =  wls
#> Call: fa(r = cov2016, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1     h2   u2 com
#> CUTGOVT1  0.32 0.0995 0.90   1
#> LESSREG1  0.38 0.1453 0.85   1
#> PRICECON  0.52 0.2675 0.73   1
#> AIDINDUS  0.65 0.4215 0.58   1
#> HLPHITEC -0.02 0.0005 1.00   1
#> MAKEJOBS  0.41 0.1719 0.83   1
#> SAVEJOBS  0.37 0.1394 0.86   1
#> JOBSALL   0.62 0.3824 0.62   1
#> CUTHOURS  0.54 0.2889 0.71   1
#> HLTHCARE  0.46 0.2132 0.79   1
#> AIDOLD    0.27 0.0715 0.93   1
#> AIDUNEMP  0.57 0.3278 0.67   1
#> EQUALIZE  0.71 0.5048 0.50   1
#> 
#>                WLS1
#> SS loadings    3.03
#> Proportion Var 0.23
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  5.14
#> The degrees of freedom for the model are 65  and the objective function was  3.17 
#> 
#> The root mean square of the residuals (RMSR) is  0.14 
#> The df corrected root mean square of the residuals is  0.16 
#> 
#> Fit based upon off diagonal values = 0.72
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.91
#> Multiple R square of scores with factors          0.82
#> Minimum correlation of possible factor scores     0.65

^{Created on 2020-02-29 by the reprex package (v0.3.0)}

JTuttle · March 3, 2020, 1:30am

Hi technocrat, and thank you!! Yes, this is exactly the approach I would take for a single year. Ideally, this would be done for each year simultaneously, with the results reported in a single object. I look forward to your suggested workflow.

Basically, I would like to use this single object for Procrustes rotation. I will rotate and compare the factors from each year to factors based on pooled correlations across years. For the comparison, I will compute Tucker's phi. This should allow me to determine whether there is structure invariance across years.

technocrat · March 3, 2020, 1:39am

Thanks for the confirm. I'll take the easy part first, a function to create the year objects. For the single object part, I may need to call on your help with understanding the functions you plan on using for Procrustes and Tucker

technocrat · March 3, 2020, 4:36pm

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(GPArotation))
suppressPackageStartupMessages(library(psych))
suppressPackageStartupMessages(library(purrr))
suppressPackageStartupMessages(library(readr))

# preprocess data from https://drive.google.com/drive/u/0/folders/1Uk7g0LtjwQpD8DjRn9yap6-TZATHEOLA

# saved to NeoliberalismData.csv (fully qualified path for reprex)
mydata <- read_csv("~/projects/demo/NeoliberalismData.csv")
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   YEAR = col_double(),
#>   WTSSALL = col_double(),
#>   CUTGOVT1 = col_double(),
#>   LESSREG1 = col_double(),
#>   PRICECON = col_double(),
#>   AIDINDUS = col_double(),
#>   HLPHITEC = col_double(),
#>   MAKEJOBS = col_double(),
#>   SAVEJOBS = col_double(),
#>   JOBSALL = col_double(),
#>   CUTHOURS = col_double(),
#>   HLTHCARE = col_double(),
#>   AIDOLD = col_double(),
#>   AIDUNEMP = col_double(),
#>   EQUALIZE = col_double(),
#>   YEAR_METRIC = col_double()
#> )


get_weights <- function(x) {
   x %>% select(WTSSALL) %>% .$WTSSALL -> wts
}

get_cov <- function(x){
    x %>% select(-X1, -YEAR, -WTSSALL, -YEAR_METRIC) %>% cov()
}

make_fa <- function(x){
    fa(x, fm = "wls", weights = wts2016)
}

# applies functions to select the weights within the YEAR subset,
# calculate the covariance matrix and create the fa objects

work_flow <- function(x){
    wts <- get_weights(x)
    covs <- get_cov(x)
    fas <- make_fa(covs)
}


# assigns results to list of fa objects corresponding to YEAR
fa_s <- mydata %>% 
  split(.$YEAR) %>% 
  map(~work_flow(.))

fa_s
#> $`1985`
#> Factor Analysis using method =  wls
#> Call: fa(r = x, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1      h2   u2 com
#> CUTGOVT1  0.01 6.7e-05 1.00   1
#> LESSREG1 -0.23 5.1e-02 0.95   1
#> PRICECON  0.52 2.7e-01 0.73   1
#> AIDINDUS  0.57 3.2e-01 0.68   1
#> HLPHITEC  0.70 4.8e-01 0.52   1
#> MAKEJOBS  0.66 4.3e-01 0.57   1
#> SAVEJOBS  0.50 2.5e-01 0.75   1
#> JOBSALL   0.65 4.3e-01 0.57   1
#> CUTHOURS  0.31 9.7e-02 0.90   1
#> HLTHCARE  0.60 3.6e-01 0.64   1
#> AIDOLD    0.32 1.0e-01 0.90   1
#> AIDUNEMP  0.42 1.7e-01 0.83   1
#> EQUALIZE  0.68 4.6e-01 0.54   1
#> 
#>                WLS1
#> SS loadings    3.42
#> Proportion Var 0.26
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  6.79
#> The degrees of freedom for the model are 65  and the objective function was  4.39 
#> 
#> The root mean square of the residuals (RMSR) is  0.16 
#> The df corrected root mean square of the residuals is  0.18 
#> 
#> Fit based upon off diagonal values = 0.72
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.93
#> Multiple R square of scores with factors          0.86
#> Minimum correlation of possible factor scores     0.71
#> 
#> $`1990`
#> Factor Analysis using method =  wls
#> Call: fa(r = x, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1    h2   u2 com
#> CUTGOVT1 -0.19 0.036 0.96   1
#> LESSREG1  0.18 0.033 0.97   1
#> PRICECON  0.69 0.480 0.52   1
#> AIDINDUS  0.66 0.439 0.56   1
#> HLPHITEC  0.42 0.172 0.83   1
#> MAKEJOBS  0.72 0.513 0.49   1
#> SAVEJOBS  0.50 0.250 0.75   1
#> JOBSALL   0.76 0.576 0.42   1
#> CUTHOURS  0.49 0.241 0.76   1
#> HLTHCARE  0.70 0.496 0.50   1
#> AIDOLD    0.82 0.677 0.32   1
#> AIDUNEMP  0.62 0.390 0.61   1
#> EQUALIZE  0.63 0.399 0.60   1
#> 
#>                WLS1
#> SS loadings    4.70
#> Proportion Var 0.36
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  7.17
#> The degrees of freedom for the model are 65  and the objective function was  3.08 
#> 
#> The root mean square of the residuals (RMSR) is  0.12 
#> The df corrected root mean square of the residuals is  0.13 
#> 
#> Fit based upon off diagonal values = 0.9
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.96
#> Multiple R square of scores with factors          0.91
#> Minimum correlation of possible factor scores     0.83
#> 
#> $`1996`
#> Factor Analysis using method =  wls
#> Call: fa(r = x, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>          WLS1      h2   u2 com
#> CUTGOVT1 0.08 0.00656 0.99   1
#> LESSREG1 0.02 0.00058 1.00   1
#> PRICECON 0.76 0.58226 0.42   1
#> AIDINDUS 0.76 0.57614 0.42   1
#> HLPHITEC 0.40 0.15807 0.84   1
#> MAKEJOBS 0.73 0.53040 0.47   1
#> SAVEJOBS 0.66 0.43349 0.57   1
#> JOBSALL  0.83 0.69102 0.31   1
#> CUTHOURS 0.41 0.16786 0.83   1
#> HLTHCARE 0.78 0.60209 0.40   1
#> AIDOLD   0.55 0.29805 0.70   1
#> AIDUNEMP 0.69 0.48272 0.52   1
#> EQUALIZE 0.76 0.57483 0.43   1
#> 
#>                WLS1
#> SS loadings    5.10
#> Proportion Var 0.39
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  7.36
#> The degrees of freedom for the model are 65  and the objective function was  2.5 
#> 
#> The root mean square of the residuals (RMSR) is  0.11 
#> The df corrected root mean square of the residuals is  0.12 
#> 
#> Fit based upon off diagonal values = 0.93
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.96
#> Multiple R square of scores with factors          0.92
#> Minimum correlation of possible factor scores     0.84
#> 
#> $`2006`
#> Factor Analysis using method =  wls
#> Call: fa(r = x, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1    h2   u2 com
#> CUTGOVT1  0.10 0.011 0.99   1
#> LESSREG1 -0.15 0.022 0.98   1
#> PRICECON  0.72 0.512 0.49   1
#> AIDINDUS  0.65 0.417 0.58   1
#> HLPHITEC  0.35 0.121 0.88   1
#> MAKEJOBS  0.46 0.210 0.79   1
#> SAVEJOBS  0.45 0.199 0.80   1
#> JOBSALL   0.52 0.272 0.73   1
#> CUTHOURS  0.28 0.077 0.92   1
#> HLTHCARE  0.82 0.670 0.33   1
#> AIDOLD    0.85 0.715 0.28   1
#> AIDUNEMP  0.59 0.353 0.65   1
#> EQUALIZE  0.61 0.369 0.63   1
#> 
#>                WLS1
#> SS loadings    3.95
#> Proportion Var 0.30
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  5.52
#> The degrees of freedom for the model are 65  and the objective function was  2.27 
#> 
#> The root mean square of the residuals (RMSR) is  0.11 
#> The df corrected root mean square of the residuals is  0.12 
#> 
#> Fit based upon off diagonal values = 0.88
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.95
#> Multiple R square of scores with factors          0.90
#> Minimum correlation of possible factor scores     0.80
#> 
#> $`2016`
#> Factor Analysis using method =  wls
#> Call: fa(r = x, fm = "wls", weights = wts2016)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1     h2   u2 com
#> CUTGOVT1  0.32 0.0995 0.90   1
#> LESSREG1  0.38 0.1453 0.85   1
#> PRICECON  0.52 0.2675 0.73   1
#> AIDINDUS  0.65 0.4215 0.58   1
#> HLPHITEC -0.02 0.0005 1.00   1
#> MAKEJOBS  0.41 0.1719 0.83   1
#> SAVEJOBS  0.37 0.1394 0.86   1
#> JOBSALL   0.62 0.3824 0.62   1
#> CUTHOURS  0.54 0.2889 0.71   1
#> HLTHCARE  0.46 0.2132 0.79   1
#> AIDOLD    0.27 0.0715 0.93   1
#> AIDUNEMP  0.57 0.3278 0.67   1
#> EQUALIZE  0.71 0.5048 0.50   1
#> 
#>                WLS1
#> SS loadings    3.03
#> Proportion Var 0.23
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  5.14
#> The degrees of freedom for the model are 65  and the objective function was  3.17 
#> 
#> The root mean square of the residuals (RMSR) is  0.14 
#> The df corrected root mean square of the residuals is  0.16 
#> 
#> Fit based upon off diagonal values = 0.72
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.91
#> Multiple R square of scores with factors          0.82
#> Minimum correlation of possible factor scores     0.65

^{Created on 2020-03-03 by the reprex package (v0.3.0)}

JTuttle · March 4, 2020, 11:26pm

Thanks again, Technocrat! I'll be away from the computer for a few days, but this weekend I will try this out and report back. I will also share the package and functions I plan to use for the Procrustes rotation and Tucker's Phi. In the meantime, here is an interesting article that pointed me toward the general method that you've contributed to:

JTuttle · March 11, 2020, 1:21am

Hi Technocrat,

I wanted to follow up on the workflow you created. I ran the workflow successfully on the full data set I am working with. I made some modifications to the workflow to facilitate principal axis factoring, and the extraction of 2 factors.

Just to verify the results of the workflow, I also ran the same fa function on a single year of data (1985). I then compared the results of that function to the results of the workflow. The results differ, and I am not sure why. Do you have an idea about the difference?

To provide an example of the difference, I ran the script on the test data I shared with you. Here is a comparison of results:

Script:

install.packages('psych',repos='http://personality-project.org/r', type='source')
library(psych)
install.packages("GPArotation")
library(GPArotation)
install.packages("remotes")
remotes::install_github("Jo-Karl/ccpsyc")
library(ccpsyc)
install.packages("dplyr")
library(dplyr)
install.packages("purrr")
library(purrr)
install.packages("readr")
library(readr)

mydata <- read.csv("~/Test Data/NeoliberalismData.csv") #loading the test dataset from local drive

get_weights <- function(x) {
  x %>% select(WTSSALL) %>% .$WTSSALL -> wts
}

get_cov <- function(x){
  x %>% select(-X, -YEAR, -WTSSALL, -YEAR_METRIC) %>% cov()
}

make_fa <- function(x){
  fa(x, fm = "pa", rotate = "promax", nfactors = 2,  weight = wts)
}

# applies functions to select the weights within the YEAR subset,
# calculate the covariance matrix and create the fa objects

work_flow <- function(x){
  wts <- get_weights(x)
  covs <- get_cov(x)
  fas <- make_fa(covs)
}

# assigns results to list of fa objects corresponding to YEAR
fa_s <- mydata %>% 
  split(.$YEAR) %>% 
  map(~work_flow(.))

Neoliberalismvars85 <- subset(mydata, YEAR==1985, select=WTSSALL:EQUALIZE)

fa85 <- fa(Neoliberalismvars85[2:14], nfactors = 2, rotate = "promax", fm = "pa", weight = Neoliberalismvars85$WTSSALL)

fa85
fa_s

Workflow results (1985 only):

> fa_s
$`1985`
Factor Analysis using method =  pa
Call: fa(r = x, nfactors = 2, rotate = "promax", fm = "pa", 
    weight = WTSSALL)
Standardized loadings (pattern matrix) based upon correlation matrix
           PA1   PA2    h2   u2 com
CUTGOVT1  0.25  0.68 0.450 0.55 1.3
LESSREG1 -0.03  0.59 0.357 0.64 1.0
PRICECON  0.62  0.24 0.383 0.62 1.3
AIDINDUS  0.48 -0.27 0.362 0.64 1.6
HLPHITEC  0.61 -0.26 0.506 0.49 1.4
MAKEJOBS  0.61 -0.15 0.431 0.57 1.1
SAVEJOBS  0.58  0.19 0.325 0.67 1.2
JOBSALL   0.68  0.05 0.445 0.56 1.0
CUTHOURS  0.29 -0.07 0.097 0.90 1.1
HLTHCARE  0.59 -0.02 0.355 0.65 1.0
AIDOLD    0.11 -0.69 0.520 0.48 1.0
AIDUNEMP  0.46  0.10 0.201 0.80 1.1
EQUALIZE  0.68 -0.03 0.470 0.53 1.0

                       PA1  PA2
SS loadings           3.34 1.56
Proportion Var        0.26 0.12
Cumulative Var        0.26 0.38
Proportion Explained  0.68 0.32
Cumulative Proportion 0.68 1.00

 With factor correlations of 
      PA1   PA2
PA1  1.00 -0.21
PA2 -0.21  1.00

Mean item complexity =  1.2
Test of the hypothesis that 2 factors are sufficient.

The degrees of freedom for the null model are  78  and the objective function was  6.79
The degrees of freedom for the model are 53  and the objective function was  3.65 

The root mean square of the residuals (RMSR) is  0.13 
The df corrected root mean square of the residuals is  0.15 

Fit based upon off diagonal values = 0.83
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.92 0.87
Multiple R square of scores with factors          0.85 0.75
Minimum correlation of possible factor scores     0.70 0.51

1985 Subset Results:

> fa85
Factor Analysis using method =  pa
Call: fa(r = Neoliberalismvars85[2:14], nfactors = 2, rotate = "promax", 
    fm = "pa", weight = Neoliberalismvars85$WTSSALL)
Standardized loadings (pattern matrix) based upon correlation matrix
           PA1   PA2    h2   u2 com
CUTGOVT1  0.29 -0.52 0.242 0.76 1.6
LESSREG1  0.04 -0.48 0.217 0.78 1.0
PRICECON  0.23  0.32 0.205 0.79 1.8
AIDINDUS  0.73 -0.09 0.494 0.51 1.0
HLPHITEC  0.43  0.45 0.538 0.46 2.0
MAKEJOBS  0.34  0.34 0.322 0.68 2.0
SAVEJOBS  0.15  0.22 0.096 0.90 1.7
JOBSALL   0.95 -0.43 0.783 0.22 1.4
CUTHOURS -0.14  0.76 0.522 0.48 1.1
HLTHCARE  0.64 -0.12 0.373 0.63 1.1
AIDOLD    0.43 -0.02 0.175 0.82 1.0
AIDUNEMP  0.45  0.17 0.283 0.72 1.3
EQUALIZE  0.57  0.15 0.408 0.59 1.1

                       PA1  PA2
SS loadings           2.95 1.71
Proportion Var        0.23 0.13
Cumulative Var        0.23 0.36
Proportion Explained  0.63 0.37
Cumulative Proportion 0.63 1.00

 With factor correlations of 
     PA1  PA2
PA1 1.00 0.37
PA2 0.37 1.00

Mean item complexity =  1.4
Test of the hypothesis that 2 factors are sufficient.

The degrees of freedom for the null model are  78  and the objective function was  6.36 with Chi Square of  138.97
The degrees of freedom for the model are 53  and the objective function was  3.43 

The root mean square of the residuals (RMSR) is  0.13 
The df corrected root mean square of the residuals is  0.16 

The harmonic number of observations is  28 with the empirical chi square  74.5  with prob <  0.027 
The total number of observations was  28  with Likelihood Chi Square =  70.28  with prob <  0.056 

Tucker Lewis Index of factoring reliability =  0.515
RMSEA index =  0.102  and the 90 % confidence intervals are  0 0.174
BIC =  -106.33
Fit based upon off diagonal values = 0.79
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.95 0.89
Multiple R square of scores with factors          0.90 0.79
Minimum correlation of possible factor scores     0.81 0.57

technocrat · March 11, 2020, 6:01am

I'll take a look. Same call on same data should equal same result, but stranger things happen!

JTuttle · March 12, 2020, 11:05pm

Okay, sounds good. Thanks again!

technocrat · March 14, 2020, 6:50pm

Here is what I get from extracting 1985 from the fa_s object of all years

> fa_s[1]
$`1985`
Factor Analysis using method =  wls
Call: fa(r = x, fm = "wls", weights = wts2016)
Standardized loadings (pattern matrix) based upon correlation matrix
          WLS1      h2   u2 com
CUTGOVT1  0.01 6.7e-05 1.00   1
LESSREG1 -0.23 5.1e-02 0.95   1
PRICECON  0.52 2.7e-01 0.73   1
AIDINDUS  0.57 3.2e-01 0.68   1
HLPHITEC  0.70 4.8e-01 0.52   1
MAKEJOBS  0.66 4.3e-01 0.57   1
SAVEJOBS  0.50 2.5e-01 0.75   1
JOBSALL   0.65 4.3e-01 0.57   1
CUTHOURS  0.31 9.7e-02 0.90   1
HLTHCARE  0.60 3.6e-01 0.64   1
AIDOLD    0.32 1.0e-01 0.90   1
AIDUNEMP  0.42 1.7e-01 0.83   1
EQUALIZE  0.68 4.6e-01 0.54   1

               WLS1
SS loadings    3.42
Proportion Var 0.26

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

The degrees of freedom for the null model are  78  and the objective function was  6.79
The degrees of freedom for the model are 65  and the objective function was  4.39 

The root mean square of the residuals (RMSR) is  0.16 
The df corrected root mean square of the residuals is  0.18 

Fit based upon off diagonal values = 0.72
Measures of factor score adequacy             
                                                  WLS1
Correlation of (regression) scores with factors   0.93
Multiple R square of scores with factors          0.86
Minimum correlation of possible factor scores     0.71

and by subsetting for 1985 first

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(GPArotation)
library(psych)
library(readr)

# preprocess data from https://drive.google.com/drive/u/0/folders/1Uk7g0LtjwQpD8DjRn9yap6-TZATHEOLA

# saved to NeoliberalismData.csv (fully qualified path for reprex)
mydata <- read_csv("~/projects/demo/NeoliberalismData.csv")
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   YEAR = col_double(),
#>   WTSSALL = col_double(),
#>   CUTGOVT1 = col_double(),
#>   LESSREG1 = col_double(),
#>   PRICECON = col_double(),
#>   AIDINDUS = col_double(),
#>   HLPHITEC = col_double(),
#>   MAKEJOBS = col_double(),
#>   SAVEJOBS = col_double(),
#>   JOBSALL = col_double(),
#>   CUTHOURS = col_double(),
#>   HLTHCARE = col_double(),
#>   AIDOLD = col_double(),
#>   AIDUNEMP = col_double(),
#>   EQUALIZE = col_double(),
#>   YEAR_METRIC = col_double()
#> )
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   YEAR = col_double(),
#>   WTSSALL = col_double(),
#>   CUTGOVT1 = col_double(),
#>   LESSREG1 = col_double(),
#>   PRICECON = col_double(),
#>   AIDINDUS = col_double(),
#>   HLPHITEC = col_double(),
#>   MAKEJOBS = col_double(),
#>   SAVEJOBS = col_double(),
#>   JOBSALL = col_double(),
#>   CUTHOURS = col_double(),
#>   HLTHCARE = col_double(),
#>   AIDOLD = col_double(),
#>   AIDUNEMP = col_double(),
#>   EQUALIZE = col_double(),
#>   YEAR_METRIC = col_double()
#> )

# create test subset, discarding X1, YEAR and YEAR_METRIC (which is constant)
mydata %>% filter(YEAR == 1985) %>% select(-X1,-YEAR,-YEAR_METRIC) -> sub1985

# extract weights 
sub1985 %>% select(WTSSALL) %>% .$WTSSALL -> wts1985

# discard WTSSALL
sub1985 %>% select(-WTSSALL) -> sub1985

# create covariance matrix
sub1985 %>% cov() -> cov1985

# create an fa object
fa1985 <- fa(cov1985, fm = "wls", weights = wts1985)

# show the result
fa1985
#> Factor Analysis using method =  wls
#> Call: fa(r = cov1985, fm = "wls", weights = wts1985)
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>           WLS1      h2   u2 com
#> CUTGOVT1  0.01 6.7e-05 1.00   1
#> LESSREG1 -0.23 5.1e-02 0.95   1
#> PRICECON  0.52 2.7e-01 0.73   1
#> AIDINDUS  0.57 3.2e-01 0.68   1
#> HLPHITEC  0.70 4.8e-01 0.52   1
#> MAKEJOBS  0.66 4.3e-01 0.57   1
#> SAVEJOBS  0.50 2.5e-01 0.75   1
#> JOBSALL   0.65 4.3e-01 0.57   1
#> CUTHOURS  0.31 9.7e-02 0.90   1
#> HLTHCARE  0.60 3.6e-01 0.64   1
#> AIDOLD    0.32 1.0e-01 0.90   1
#> AIDUNEMP  0.42 1.7e-01 0.83   1
#> EQUALIZE  0.68 4.6e-01 0.54   1
#> 
#>                WLS1
#> SS loadings    3.42
#> Proportion Var 0.26
#> 
#> Mean item complexity =  1
#> Test of the hypothesis that 1 factor is sufficient.
#> 
#> The degrees of freedom for the null model are  78  and the objective function was  6.79
#> The degrees of freedom for the model are 65  and the objective function was  4.39 
#> 
#> The root mean square of the residuals (RMSR) is  0.16 
#> The df corrected root mean square of the residuals is  0.18 
#> 
#> Fit based upon off diagonal values = 0.72
#> Measures of factor score adequacy             
#>                                                   WLS1
#> Correlation of (regression) scores with factors   0.93
#> Multiple R square of scores with factors          0.86
#> Minimum correlation of possible factor scores     0.71

^{Created on 2020-03-14 by the reprex package (v0.3.0)}

So, I'm not sure where your difference comes from. Take a look at your revised function and come back?

system · April 4, 2020, 6:50pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.