new to RStudio, several error codes (e.g. CI variance-covariance; undefined columns) with linear mixed model analysis - I would appreciate your help!

Eule · April 29, 2024, 7:48pm

hello dear community,

I'm a medical student depending on some R analyses for her thesis without experience in programming or complex statistics.
Content of my analytical work is to correlate hundreds of enzymes with dozens of longitudinal (3 timepoints) outcomes.

Due to the huge amount of necessary analyses I cannot do them one by one via SPSS, but have to use R. However, I never learned about that and our university doesn't offer any courses.

first analyses worked out,
then I started to only receive error codes, the ones I received so far were e.g.:
• “Error in intervals.gls(.Fit, level = 0.95) :
cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance”
• “Error in [.data.frame(.Daten.temp, , .Basis.modell.Cazyme) :
undefined columns selected” (double-checked correct spelling of columns)
(• Fehler: Objekt 'path.z.data.import' nicht gefunden)
• Error in setwd(path.z.data.import) : cannot change working directory
(regarding the last one (working path not found), I followed the instructions in How to Fix in R: cannot change working directory - Statology but it still didn't work, unfortunately (also proved that working path name was correct using „#get current working directory getwd()“)

Sometimes reading and analysis works without any error code, but no results appear even though I havn't changed anything about the result listing (and it worked before).

Sometimes only a third or half of the enzymes (independent variables) is analysed and the rest is just missing. If I only use this rest for analysis, usually the first error code above ("Error in intervals.gls(.Fit, level = 0.95)..." appears.

I would be thankful for some ideas/support/thoughts and can of course go into detail, however I don't want to load this message with excessive information.

All data and test subjects are of course freely invented:

ID	Time_(1-3)	Age	Sex	IFNa_(pg/ml)	LPS_(pg/ml)	I-FABP_(pg/ml)	IP-10_(pg/ml)	Galectin-9_(pg/ml)	Zonulin_(ng/ml)	RS	RS_cut_off	AA0	AA1	AA10	AA1_2	AA2	AA3_2	AA3_3	AA4	AA5
1	1	50	F	20,00	5,00	5,00	20,00	20,00	20,00	2,6	1	0,00000	0,00002	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000
1	2	50	F	30,00	10,00		30,00	30,00	30,00	9,7	1	0,00002	0,00000	0,00000	0,00000	0,00000	0,00002	0,00000	0,00000	0,00001
1	3	50	F	40,00	15,00	15,00	40,00	40,00	40,00		1	0,00000	0,00001	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00001
2	1	40	F		10,00	10,00	30,00	30,00	30,00	3,41	1	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000
2	2	40	F	40,00	15,00	15,00	40,00	40,00		6,5	1	0,00000	0,00004	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00001
2	3	40	F	50,00	35,00	35,00	50,00	50,00	50,00	3,85	1	0,00000	0,00004	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00001
3	1	30	M	40,00		15,00		40,00	40,00	1,6	0	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00001
3	2	30	M	30,00	10,00	10,00	30,00	30,00	30,00	1,2	0
3	3	30	M	20,00	7,00	7,00	20,00	20,00	20,00	3,41	1	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000	0,00000

column	explanation
ID	fictional test subjects
Time	timepoint 1-3
IFNa_(pg/ml)	dependent variables
LPS_(pg/ml)	dependent variables
I-FABP_(pg/ml)	dependent variables
IP-10_(pg/ml)	dependent variables
Galectin-9_(pg/ml)	dependent variables
Zonulin_(ng/ml)	dependent variables
RS	dependent variables
RS_cut_off	dependent variables: RS < 2,5 = 0; RS ≥ 2,5 = 1
AA0 - AA5 (few examples)	independent variables (enzymes)

example code (in reality the list of "CAZymes" (independent variables for analysis) is a few hundred more):
´´´

#Umlaute, um ggf. Encoding-Probleme / UTF-8-Probleme zu erkennen: ä ö ü Ä Ö Ü ß

#.trPaths <- paste(paste(Sys.getenv('APPDATA'), '\Tinn-R\tmp\', sep=''), c('', 'search.txt', 'objects.txt', 'file.r', 'selection.r', 'block.r', 'lines.r'), sep='')

%+% <- function(x1, x2)paste0(x1, x2)
#"Hallo" %+% " " %+% 1 %+% 2

#wiss. Notation deaktivieren
options(scipen=15,"digits"=20)

r auf Englisch umstellen:
Sys.setenv(LANG = "en")

#laengere Ausgabe von data.frames:
options( max.print = 1000000 )

#ToDo: Pfade anpassen (es wird der Pfad ausgesucht, der existiert, wobei unterster Pfad hoechste Prioritaet, wenn mehrere existieren) :
homepath.list <- list() ; qr <- 1
homepath.list[[ qr ]] <- r"(C:\yyy)" ; qr <- qr + 1
homepath.list[[ qr ]] <- r"(E:\B\Beratung.P\Mueller.Carolin\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1
homepath.list[[ qr ]] <- r"(C:\Users\ccmue\Documents\Promotion Rheumatologie PC\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1 # hier in der letzten Zeile an der Stelle ' r"(C:\zzz)" ' geben Sie am Besten Ihren Arbeitsordner an
homepath.list[[ qr ]] <- r"(C:\Users\ccmue\Documents\Promotion Rheumatologie PC\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1 # hier in der letzten Zeile an der Stelle ' r"(C:\zzz)" ' geben Sie am Besten Ihren Arbeitsordner an
homepath.vek <- do.call( "c" , homepath.list )
for( i in 1:length( homepath.vek ) ){
temp <- try( setwd( homepath.vek[i] ) , silent = TRUE )
if( class( temp ) != "try-error" ){ homepath <- homepath.vek[i] ; stop }
}
rm( temp )
homepath

path.z.data.import <- paste0( homepath , "/z.data.import/2024-02-17" )
path.z.data.import <- paste0( homepath , "/z.data.import/2024-04-18" )
path.z.child.scripts.analyses <- paste0( homepath , "/z.child.scripts.analyses" )
path.z.results <- paste0( homepath , "/z.results" )

if( TRUE ){
install packages in loop :
packages.to.install.list <- list() ; p <- 1
packages.to.install.list[[ p ]] <- "foreign" ; p <- p+1
packages.to.install.list[[ p ]] <- "survival" ; p <- p+1
packages.to.install.list[[ p ]] <- "psych" ; p <- p+1
packages.to.install.list[[ p ]] <- "nlme" ; p <- p+1
packages.to.install.list[[ p ]] <- "foreign" ; p <- p+1
packages.to.install.list[[ p ]] <- "lme4" ; p <- p+1
packages.to.install.list[[ p ]] <- "plyr" ; p <- p+1
packages.to.install.list[[ p ]] <- "haven" ; p <- p+1
packages.to.install.list[[ p ]] <- "openxlsx" ; p <- p+1
packages.to.install.list[[ p ]] <- "test" ; p <- p+1
packages.to.install <- do.call( "c" , packages.to.install.list )

if( length( packages.to.install ) > 0 ){
for( i in 1:length( packages.to.install ) ){
temp <- packages.to.install[ i ]
if( !require( temp , character.only = TRUE , quietly = TRUE , warn.conflicts = FALSE ) )
{ install.packages( temp ) }
library( temp , character.only = TRUE )
}

for( i in 1:length( packages.to.install ) ){
  temp <- packages.to.install[ i ]
  if( !require( temp , character.only = TRUE , quietly = TRUE , warn.conflicts = FALSE ) )
    { install.packages( temp , repos = "https://packagemanager.rstudio.com/cran/latest" ) }
  library( temp , character.only = TRUE )
 }

}
}

#Funktion zur Bererchnung von Schiefe:
spssSkewKurtosis=function(x) {
w=length(x)
m1=mean(x)
m2=sum((x-m1)^2)
m3=sum((x-m1)^3)
m4=sum((x-m1)^4)
s1=sd(x)
skew=wm3/(w-1)/(w-2)/s1^3
sdskew=sqrt( 6w*(w-1) / ((w-2)(w+1)(w+3)) )
kurtosis=(w*(w+1)m4 - 3m2^2*(w-1)) / ((w-1)(w-2)(w-3)s1^4)
sdkurtosis=sqrt( 4(w^2-1) * sdskew^2 / ((w-3)*(w+5)) )
mat=matrix(c(skew,kurtosis, sdskew,sdkurtosis), 2,
dimnames=list(c("skew","kurtosis"), c("estimate","se")))
return(mat)
}

#Function that formats dates:
Datumsformatierung <- function(DF){
Date.names <- rep(NA,dim(DF)[2])
for(i in 1:dim(DF)[2]){
if(class(DF[,i]) != "numeric") {next}
if( !is.element(FALSE, is.na(DF[,i])) ) {next}
if(min(DF[,i], na.rm = TRUE ) >= 10000000000 & min(DF[,i], na.rm = TRUE )!=Inf & max(DF[,i], na.rm = TRUE ) <= 85000000000 & max(DF[,i], na.rm = TRUE ) !=-Inf)
{DF[,i] <- as.Date(DF[[i]] + ISOdate(1582,10,14), origin = "1970-01-01") ; Date.names[i] <- names(DF[i])}
}
return(DF)
}

################################################################################
################################################################################
################################################################################

#####################

Daten einlesen:

#####################

Die Dateien ansehen im Ordner '...\z.data.import' (das sind die Dateien, die wir potentiell einlesen wollen):

setwd( path.z.data.import ) ; Names.files <- list.files(

pattern='*.xlsx'

)
Names.files

#Datennamen anpassen ( hier "SPSS.Datensatz fuer Datensaetze mit Endung .sav" )
setwd( path.z.data.import ) ; Daten.master <- foreign::read.spss( "fictional dataset_(not_real)_enzymes.markers_SPSS.sav", to.data.frame = TRUE ) #Dateinamen anpassen

Daten.master[ , "Time_13.factor" ] <- factor( Daten.master[ , "Time_13" ] , levels = c( 1 , 2 , 3 ) )

.Daten.temp <- Daten.master
#.Daten.temp[, "y"] <- .Daten.temp[, "Wert"]
#.Daten.temp[, "y"] <- pnorm(Daten[, "Wert"])
.Daten.temp[ 1:20 , ]
#utils::View( .Daten.temp[ 1:20 , ] )

#data of the first 3 patients:
split( .Daten.temp[ , 1:10 ] , .Daten.temp[ , "ID" ] )[ 1:4 ]

###############################

Analyse (linear mixed model):

##############################

Cazyme <- c( "AA0"
, "AA1"
, "AA10"
, "AA3_2"
, "AA4"
, "AA5" )

.Basis.modell <- c( "ID" , "IFNA_pgml" , "Time_13" , "Time_13.factor" , "Age" )

setwd( path.z.results )

.Liste <- list()
for( i in 1:length(Cazyme) ){
#i <- 1

.Basis.modell.Cazyme <- c( .Basis.modell , Cazyme[ i ] )

"Hallo" %+% " " %+% 1 %+% 2

.Model <- "IFNa_pgml ~ Age + Time_13.factor" %+% " + " %+% Cazyme[ i ]

#Faelle mit missing values bei den involvierten Variablen entfernen.
#Zwar nicht immer notwendig, aber dennoch sinnvoll (vor allem bei Regressionsanalysen)
#und sicherer:
.Daten.temp <- Daten.master
.Index <- complete.cases( .Daten.temp[ , .Basis.modell.Cazyme ] )
.Daten.temp <- .Daten.temp[ .Index , ]

if( FALSE ){
#Endpunkt enthaelt extreme Ausreisser :
as.numeric( .Daten.temp[ , "IFNa_pgml" ] )
spssSkewKurtosis( as.numeric( .Daten.temp[ , "IFNa_pgml" ] ) )
}

#Model <- "IFNa_pgml ~ Age + Time_13.factor + AA0"
#Model <- "IFNa_pgml ~ AA0.new + Time_13.factor"
#Model <- "IFNa_pgml ~ AA0.new2 + Time_13.factor"

aa <- try( .Fit <- gls( as.formula( .Model )
, data = .Daten.temp , na.action = na.omit
, method = "ML"
, correlation = corCAR1( form = ~ Time_13 | ID )
)
)
if ( is.element( "try-error" , class(aa) ) ){ next }

.Result.pre <- summary( .Fit )
#.Result.pre["tTable"][[ 1 ]]

CI <- intervals( .Fit, level = 0.95 )
.temp1 <- as.data.frame(.Result.pre$tTable)
.temp2 <- CI$coef
.Result <- cbind( .temp1["Value"] , .temp2[,c("lower","upper")] , .temp1["p-value"] )
.Result

.Liste[[ i ]] <- data.frame( Model = .Model, p.value = .Result[ dim(.Result)[1] , "p-value" ] )

write.csv2( .Result , file = "IFNa_pgml" %+% " " %+% Cazyme[ i ] %+% " " %+% "Results.csv" , row.names = TRUE , na = "" )

#assign( Cazyme[i] , .Result)

}

.Result.listing <- do.call( "rbind" , .Liste )
write.csv2( .Result.listing , file = "IFNa_pgml" %+% " " %+% "Result.csv" , row.names = TRUE , na = "" )

.Liste

Cazyme[4]
'''

jrkrideau · April 30, 2024, 12:21am

Welcome to the forum.

It sounds like you have a bit of a mess on your hands but it probably is not all that bad.

What we need is to see your code and some sample data.

Just copy your all your code and paste it here between
```

```

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

See FAQ Asking Questions for some general information on asking questions.

Eule · April 30, 2024, 7:17am

@jrkrideau thank you, I now added some data and the code (as a new user I cannot upload attachments so I added them to the original message).

jrkrideau · April 30, 2024, 12:24pm

Your code did not arrive in good shape.

We need the data between
```

and you have it between
´´´
which does nothing. It gives us a very messy code that is almost impossible to read.

I think it is the same key on a German keyboard but with the shift key held down.

```

Please send the data again in a new message.

The data seems okay but it ins better to use the the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

This gives us an exact copy of your data. Other methods may not.