Matching example data structure for mvgls in mvMORPH

Makaylee.crone22 · January 18, 2024, 10:45pm

Hi,

I'm using the mvMORPH package to fit a model with a phylogenetic generalized least squares. The package authors give code to simulate data but I'm having trouble making my data match the simulation data structure. Note that a phylogenetic tree is also required here but I'm omitting to make this post less messy.

#Making the simulation mvMORPH data
set.seed(1)
n <- 32 # number of species
p <- 50 # number of traits (p>n)
tree <- pbtree(n=n, scale=1) # phylogenetic tree
R <- crossprod(matrix(runif(p*p), ncol=p)) # a random symmetric matrix (covariance)
# simulate a BM dataset
Y <- mvSIM(tree, model="BM1", nsim=1, param=list(sigma=R, theta=rep(0,p)))
data=list(Y=Y)

#Example of my data

df1 <- data.frame(
                  var1 = c(1,16,0,0,0),
                  var2 = c(17,24,0,0,0),
                  var3 = c(2,1,0,39,3),
                  var4 = c(0,0,36,0,5),
                  var5 = c(3,2,0,0,16))

#Fits the desired model
data=list(df1=df1)
fit1 <- mvgls(df1~1, data=data, tre, model="BM", penalty="RidgeArch")

When I attempt to run the code to fit the model I get the error message "Error in svd(B, nu = 0) : infinite or missing values in 'x'".

A quick google suggested that this is because my data should not be a data frame, but rather a matrix. The issue is that after transforming my data to a matrix I get different outputs when using str() to see if the simulation data and my data match.

#Simulation data
 num [1:32, 1:50] 2.93 3.2 5.77 2.65 2.5 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:32] "t17" "t18" "t5" "t11" ...
  ..$ : NULL 


#My data
 num [1:5, 1:5] 1 16 0 0 0 17 24 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "var1" "var2" "var3" "var4" ...

I'm assuming that my code won't run because the NULL and character lines should be switched, but not really sure how to do this.

Any suggestions? Thanks for the help!

AlexisW · January 19, 2024, 4:11am

What is df6? It doesn't seem to be defined in your code.

You seem to have several code snippets and it's unclear how they are related. It would be easier to help you if you gave a single, straightforward, reproducible example.

Makaylee.crone22 · January 19, 2024, 5:09pm

Apologies, this should be df1! I edited the post and tried to cut down significantly and consolidate code. Please let me know if this is still confusing.

AlexisW · January 22, 2024, 11:25pm

I'm still confused, here are reproducible examples based on your code in the first post, the first one runs without problem, the second one complains about the lack of rownames but runs anyway:

#Making the simulation mvMORPH data
set.seed(1)
n <- 32 # number of species
p <- 50 # number of traits (p>n)
tree <- phytools::pbtree(n=n, scale=1) # phylogenetic tree
R <- crossprod(matrix(runif(p*p), ncol=p)) # a random symmetric matrix (covariance)
# simulate a BM dataset
Y <- mvMORPH::mvSIM(tree, model="BM1", nsim=1, param=list(sigma=R, theta=rep(0,p)))
data=list(Y=Y)
fit_y <- mvMORPH::mvgls(Y~1, data=data, tree, model="BM", penalty="RidgeArch")
fit_y
#> 
#> Call:
#> mvMORPH::mvgls(formula = Y ~ 1, data = data, tree = tree, model = "BM", 
#>     penalty = "RidgeArch")
#> 
#> 
#> Generalized least squares fit by penalized REML 
#> LOOCV of the log-restricted-likelihood: -2305.528 
#> 
#> 
#> Parameter estimate(s):
#> Regularization parameter (gamma): 0.193 
#> 
#> 
#> Covariance matrix of size: 50 by 50 
#> for 32 observations 
#> 
#> Coefficients (truncated):
#>              [,1]  [,2]  [,3]  [,4]  [,5]  [,6]   [,7]  [,8]    [,9] [,10]
#> (Intercept) 1.706 1.093 3.537 1.059 2.169 1.645 -1.059 1.876 0.09703 5.584
#> Use "coef" to display all the coefficients

#Example of my data

df1 <- as.matrix(data.frame(
  var1 = c(1,16,0,0,0),
  var2 = c(17,24,0,0,0),
  var3 = c(2,1,0,39,3),
  var4 = c(0,0,36,0,5),
  var5 = c(3,2,0,0,16)))

#Fits the desired model
tre <- phytools::pbtree(n=nrow(df1), scale=1) # phylogenetic tree

data=list(df1=df1)
fit1 <- mvMORPH::mvgls(df1~1, data=data, tre, model="BM", penalty="RidgeArch")
#> Warning in mvMORPH::mvgls(df1 ~ 1, data = data, tre, model = "BM", penalty = "RidgeArch"): the data has no names, order assumed to be the same as tip labels in the tree.
fit1
#> 
#> Call:
#> mvMORPH::mvgls(formula = df1 ~ 1, data = data, tree = tre, model = "BM", 
#>     penalty = "RidgeArch")
#> 
#> 
#> Generalized least squares fit by penalized REML 
#> LOOCV of the log-restricted-likelihood: -85.5726 
#> 
#> 
#> Parameter estimate(s):
#> Regularization parameter (gamma): 1 
#> 
#> 
#> Covariance matrix of size: 5 by 5 
#> for 5 observations 
#> 
#> Coefficients:
#>              var1    var2    var3    var4    var5  
#> (Intercept)   2.582   6.227  10.600   8.881   4.624

^{Created on 2024-01-22 with reprex v2.0.2}

So the error you are getting isn't due to the lack of rownames, it appears to be due to something in the structure of the data (the fact that there are infinite or missing values in some of the intermediate steps). Can you run your code to get the error, then run traceback(), so we can see where exactly the error happened?

Makaylee.crone22 · January 22, 2024, 11:43pm

Thanks for responding.

Running fit1 <- mvgls(df6~1, data=data, tree2, model="BM", penalty="RidgeArch") and then traceback() gives:

12: stop("infinite or missing values in 'x'")
11: svd(B, nu = 0)
10: psmall.svd(m, tol)
9: fast.svd(m, tol)
8: pseudoinverse(mod_par$X)
7: FUN(newX[, i], ...)
6: apply(brute_force, 1, .loocvPhylo, cvmethod = cvmethod, targM = target,
corrStr = corrModel, penalty = penalty, error = mserr, nobs = corrModel$nobs)
5: which.min(apply(brute_force, 1, .loocvPhylo, cvmethod = cvmethod,
targM = target, corrStr = corrModel, penalty = penalty, error = mserr,
nobs = corrModel$nobs))
4: [.data.frame(brute_force, which.min(apply(brute_force, 1, .loocvPhylo,
cvmethod = cvmethod, targM = target, corrStr = corrModel,
penalty = penalty, error = mserr, nobs = corrModel$nobs)),
)
3: brute_force[which.min(apply(brute_force, 1, .loocvPhylo, cvmethod = cvmethod,
targM = target, corrStr = corrModel, penalty = penalty, error = mserr,
nobs = corrModel$nobs)), ]
2: .startGuess(corrModel, cvmethod = method, mserr = mserr, target = target,
penalty = penalty, echo = echo, penalized)
1: mvgls(df6 ~ 1, data = data, tree2, model = "BM", penalty = "RidgeArch")

AlexisW · January 23, 2024, 12:08am

Sorry, I don't have the answer, I imagine there is something in the structure of the data that's a problem, but understanding what would take an understanding of the function that I don't have.

You should first make sure you visualize your data with different approaches to see if you can find anything "weird" (a classic is constant rows or columns); you could try varying the penalty, target, and method arguments to see if it's better with a slightly different approach; you could try asking the authors directly.

system · March 5, 2024, 12:08am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.