hello dear community,
I'm a medical student depending on some R analyses for her thesis without experience in programming or complex statistics.
Content of my analytical work is to correlate hundreds of enzymes with dozens of longitudinal (3 timepoints) outcomes.
Due to the huge amount of necessary analyses I cannot do them one by one via SPSS, but have to use R. However, I never learned about that and our university doesn't offer any courses.
first analyses worked out,
then I started to only receive error codes, the ones I received so far were e.g.:
• “Error in intervals.gls(.Fit, level = 0.95) :
cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance”
• “Error in [.data.frame
(.Daten.temp, , .Basis.modell.Cazyme) :
undefined columns selected” (double-checked correct spelling of columns)
(• Fehler: Objekt 'path.z.data.import' nicht gefunden)
• Error in setwd(path.z.data.import) : cannot change working directory
(regarding the last one (working path not found), I followed the instructions in How to Fix in R: cannot change working directory - Statology but it still didn't work, unfortunately (also proved that working path name was correct using „#get current working directory getwd()“)
Sometimes reading and analysis works without any error code, but no results appear even though I havn't changed anything about the result listing (and it worked before).
Sometimes only a third or half of the enzymes (independent variables) is analysed and the rest is just missing. If I only use this rest for analysis, usually the first error code above ("Error in intervals.gls(.Fit, level = 0.95)..." appears.
I would be thankful for some ideas/support/thoughts and can of course go into detail, however I don't want to load this message with excessive information.
All data and test subjects are of course freely invented:
ID | Time_(1-3) | Age | Sex | IFNa_(pg/ml) | LPS_(pg/ml) | I-FABP_(pg/ml) | IP-10_(pg/ml) | Galectin-9_(pg/ml) | Zonulin_(ng/ml) | RS | RS_cut_off | AA0 | AA1 | AA10 | AA1_2 | AA2 | AA3_2 | AA3_3 | AA4 | AA5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 50 | F | 20,00 | 5,00 | 5,00 | 20,00 | 20,00 | 20,00 | 2,6 | 1 | 0,00000 | 0,00002 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 |
1 | 2 | 50 | F | 30,00 | 10,00 | 30,00 | 30,00 | 30,00 | 9,7 | 1 | 0,00002 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00002 | 0,00000 | 0,00000 | 0,00001 | |
1 | 3 | 50 | F | 40,00 | 15,00 | 15,00 | 40,00 | 40,00 | 40,00 | 1 | 0,00000 | 0,00001 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00001 | |
2 | 1 | 40 | F | 10,00 | 10,00 | 30,00 | 30,00 | 30,00 | 3,41 | 1 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | |
2 | 2 | 40 | F | 40,00 | 15,00 | 15,00 | 40,00 | 40,00 | 6,5 | 1 | 0,00000 | 0,00004 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00001 | |
2 | 3 | 40 | F | 50,00 | 35,00 | 35,00 | 50,00 | 50,00 | 50,00 | 3,85 | 1 | 0,00000 | 0,00004 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00001 |
3 | 1 | 30 | M | 40,00 | 15,00 | 40,00 | 40,00 | 1,6 | 0 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00001 | ||
3 | 2 | 30 | M | 30,00 | 10,00 | 10,00 | 30,00 | 30,00 | 30,00 | 1,2 | 0 | |||||||||
3 | 3 | 30 | M | 20,00 | 7,00 | 7,00 | 20,00 | 20,00 | 20,00 | 3,41 | 1 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 | 0,00000 |
column | explanation |
---|---|
ID | fictional test subjects |
Time | timepoint 1-3 |
IFNa_(pg/ml) | dependent variables |
LPS_(pg/ml) | dependent variables |
I-FABP_(pg/ml) | dependent variables |
IP-10_(pg/ml) | dependent variables |
Galectin-9_(pg/ml) | dependent variables |
Zonulin_(ng/ml) | dependent variables |
RS | dependent variables |
RS_cut_off | dependent variables: RS < 2,5 = 0; RS ≥ 2,5 = 1 |
AA0 - AA5 (few examples) | independent variables (enzymes) |
example code (in reality the list of "CAZymes" (independent variables for analysis) is a few hundred more):
´´´
#Umlaute, um ggf. Encoding-Probleme / UTF-8-Probleme zu erkennen: ä ö ü Ä Ö Ü ß
#.trPaths <- paste(paste(Sys.getenv('APPDATA'), '\Tinn-R\tmp\', sep=''), c('', 'search.txt', 'objects.txt', 'file.r', 'selection.r', 'block.r', 'lines.r'), sep='')
%+%
<- function(x1, x2)paste0(x1, x2)
#"Hallo" %+% " " %+% 1 %+% 2
#wiss. Notation deaktivieren
options(scipen=15,"digits"=20)
r auf Englisch umstellen:
Sys.setenv(LANG = "en")
#laengere Ausgabe von data.frames:
options( max.print = 1000000 )
#ToDo: Pfade anpassen (es wird der Pfad ausgesucht, der existiert, wobei unterster Pfad hoechste Prioritaet, wenn mehrere existieren) :
homepath.list <- list() ; qr <- 1
homepath.list[[ qr ]] <- r"(C:\yyy)" ; qr <- qr + 1
homepath.list[[ qr ]] <- r"(E:\B\Beratung.P\Mueller.Carolin\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1
homepath.list[[ qr ]] <- r"(C:\Users\ccmue\Documents\Promotion Rheumatologie PC\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1 # hier in der letzten Zeile an der Stelle ' r"(C:\zzz)" ' geben Sie am Besten Ihren Arbeitsordner an
homepath.list[[ qr ]] <- r"(C:\Users\ccmue\Documents\Promotion Rheumatologie PC\2.x.Linear.mixed.model.LMM.variance.component)" ; qr <- qr + 1 # hier in der letzten Zeile an der Stelle ' r"(C:\zzz)" ' geben Sie am Besten Ihren Arbeitsordner an
homepath.vek <- do.call( "c" , homepath.list )
for( i in 1:length( homepath.vek ) ){
temp <- try( setwd( homepath.vek[i] ) , silent = TRUE )
if( class( temp ) != "try-error" ){ homepath <- homepath.vek[i] ; stop }
}
rm( temp )
homepath
path.z.data.import <- paste0( homepath , "/z.data.import/2024-02-17" )
path.z.data.import <- paste0( homepath , "/z.data.import/2024-04-18" )
path.z.child.scripts.analyses <- paste0( homepath , "/z.child.scripts.analyses" )
path.z.results <- paste0( homepath , "/z.results" )
if( TRUE ){
install packages in loop :
packages.to.install.list <- list() ; p <- 1
packages.to.install.list[[ p ]] <- "foreign" ; p <- p+1
packages.to.install.list[[ p ]] <- "survival" ; p <- p+1
packages.to.install.list[[ p ]] <- "psych" ; p <- p+1
packages.to.install.list[[ p ]] <- "nlme" ; p <- p+1
packages.to.install.list[[ p ]] <- "foreign" ; p <- p+1
packages.to.install.list[[ p ]] <- "lme4" ; p <- p+1
packages.to.install.list[[ p ]] <- "plyr" ; p <- p+1
packages.to.install.list[[ p ]] <- "haven" ; p <- p+1
packages.to.install.list[[ p ]] <- "openxlsx" ; p <- p+1
packages.to.install.list[[ p ]] <- "test" ; p <- p+1
packages.to.install <- do.call( "c" , packages.to.install.list )
if( length( packages.to.install ) > 0 ){
for( i in 1:length( packages.to.install ) ){
temp <- packages.to.install[ i ]
if( !require( temp , character.only = TRUE , quietly = TRUE , warn.conflicts = FALSE ) )
{ install.packages( temp ) }
library( temp , character.only = TRUE )
}
for( i in 1:length( packages.to.install ) ){
temp <- packages.to.install[ i ]
if( !require( temp , character.only = TRUE , quietly = TRUE , warn.conflicts = FALSE ) )
{ install.packages( temp , repos = "https://packagemanager.rstudio.com/cran/latest" ) }
library( temp , character.only = TRUE )
}
}
}
#Funktion zur Bererchnung von Schiefe:
spssSkewKurtosis=function(x) {
w=length(x)
m1=mean(x)
m2=sum((x-m1)^2)
m3=sum((x-m1)^3)
m4=sum((x-m1)^4)
s1=sd(x)
skew=wm3/(w-1)/(w-2)/s1^3
sdskew=sqrt( 6w*(w-1) / ((w-2)(w+1)(w+3)) )
kurtosis=(w*(w+1)m4 - 3m2^2*(w-1)) / ((w-1)(w-2)(w-3)s1^4)
sdkurtosis=sqrt( 4(w^2-1) * sdskew^2 / ((w-3)*(w+5)) )
mat=matrix(c(skew,kurtosis, sdskew,sdkurtosis), 2,
dimnames=list(c("skew","kurtosis"), c("estimate","se")))
return(mat)
}
#Function that formats dates:
Datumsformatierung <- function(DF){
Date.names <- rep(NA,dim(DF)[2])
for(i in 1:dim(DF)[2]){
if(class(DF[,i]) != "numeric") {next}
if( !is.element(FALSE, is.na(DF[,i])) ) {next}
if(min(DF[,i], na.rm = TRUE ) >= 10000000000 & min(DF[,i], na.rm = TRUE )!=Inf & max(DF[,i], na.rm = TRUE ) <= 85000000000 & max(DF[,i], na.rm = TRUE ) !=-Inf)
{DF[,i] <- as.Date(DF[[i]] + ISOdate(1582,10,14), origin = "1970-01-01") ; Date.names[i] <- names(DF[i])}
}
return(DF)
}
################################################################################
################################################################################
################################################################################
#####################
Daten einlesen:
#####################
Die Dateien ansehen im Ordner '...\z.data.import' (das sind die Dateien, die wir potentiell einlesen wollen):
setwd( path.z.data.import ) ; Names.files <- list.files(
pattern='*.xlsx'
)
Names.files
#Datennamen anpassen ( hier "SPSS.Datensatz fuer Datensaetze mit Endung .sav" )
setwd( path.z.data.import ) ; Daten.master <- foreign::read.spss( "fictional dataset_(not_real)_enzymes.markers_SPSS.sav", to.data.frame = TRUE ) #Dateinamen anpassen
Daten.master[ , "Time_13.factor" ] <- factor( Daten.master[ , "Time_13" ] , levels = c( 1 , 2 , 3 ) )
.Daten.temp <- Daten.master
#.Daten.temp[, "y"] <- .Daten.temp[, "Wert"]
#.Daten.temp[, "y"] <- pnorm(Daten[, "Wert"])
.Daten.temp[ 1:20 , ]
#utils::View( .Daten.temp[ 1:20 , ] )
#data of the first 3 patients:
split( .Daten.temp[ , 1:10 ] , .Daten.temp[ , "ID" ] )[ 1:4 ]
###############################
Analyse (linear mixed model):
##############################
Cazyme <- c( "AA0"
, "AA1"
, "AA10"
, "AA3_2"
, "AA4"
, "AA5" )
.Basis.modell <- c( "ID" , "IFNA_pgml" , "Time_13" , "Time_13.factor" , "Age" )
setwd( path.z.results )
.Liste <- list()
for( i in 1:length(Cazyme) ){
#i <- 1
.Basis.modell.Cazyme <- c( .Basis.modell , Cazyme[ i ] )
"Hallo" %+% " " %+% 1 %+% 2
.Model <- "IFNa_pgml ~ Age + Time_13.factor" %+% " + " %+% Cazyme[ i ]
#Faelle mit missing values bei den involvierten Variablen entfernen.
#Zwar nicht immer notwendig, aber dennoch sinnvoll (vor allem bei Regressionsanalysen)
#und sicherer:
.Daten.temp <- Daten.master
.Index <- complete.cases( .Daten.temp[ , .Basis.modell.Cazyme ] )
.Daten.temp <- .Daten.temp[ .Index , ]
if( FALSE ){
#Endpunkt enthaelt extreme Ausreisser :
as.numeric( .Daten.temp[ , "IFNa_pgml" ] )
spssSkewKurtosis( as.numeric( .Daten.temp[ , "IFNa_pgml" ] ) )
}
#Model <- "IFNa_pgml ~ Age + Time_13.factor + AA0"
#Model <- "IFNa_pgml ~ AA0.new + Time_13.factor"
#Model <- "IFNa_pgml ~ AA0.new2 + Time_13.factor"
aa <- try( .Fit <- gls( as.formula( .Model )
, data = .Daten.temp , na.action = na.omit
, method = "ML"
, correlation = corCAR1( form = ~ Time_13 | ID )
)
)
if ( is.element( "try-error" , class(aa) ) ){ next }
.Result.pre <- summary( .Fit )
#.Result.pre["tTable"][[ 1 ]]
CI <- intervals( .Fit, level = 0.95 )
.temp1 <- as.data.frame(.Result.pre$tTable)
.temp2 <- CI$coef
.Result <- cbind( .temp1["Value"] , .temp2[,c("lower","upper")] , .temp1["p-value"] )
.Result
.Liste[[ i ]] <- data.frame( Model = .Model, p.value = .Result[ dim(.Result)[1] , "p-value" ] )
write.csv2( .Result , file = "IFNa_pgml" %+% " " %+% Cazyme[ i ] %+% " " %+% "Results.csv" , row.names = TRUE , na = "" )
#assign( Cazyme[i] , .Result)
}
.Result.listing <- do.call( "rbind" , .Liste )
write.csv2( .Result.listing , file = "IFNa_pgml" %+% " " %+% "Result.csv" , row.names = TRUE , na = "" )
.Liste
Cazyme[4]
'''