Loop over a list & t.test (p.value)

Hugo_Cordeau · June 19, 2020, 6:01pm

Hi, I have data of two times series & I want to know if they are statistically different. I have like 120 possible combinations and other time series to analyze too, so I have to generate a loop for it.

I would like to :

Name variables (this works, but bad coding)
Have the p.value for each observation. (I only put M (Month), but I would normally have January, May, ect)

1 (ok)

Names <- c("Wind","Expected","Real","Loss","CF_B","CF_N","CF_L")
Time <- c("Y","S", "M", "H","HS", "HM")

rm(Name)
Name <- "H"
for(T in Time) {
for (N in Names) {
Name[N[T]] <- paste(T,"_",N, sep="")
}
}
Name <- as.data.frame(Name)
Name <- subset(Name, Name != "H")
Name$P_Value <- Name$Name

#This seems bad coding, but I don't get why I can't just spawn the vector below without setting it.

2. P.value

This works
Y_Wind <- t.test(Y_Hist$Wind,Y_Hor$Wind,na.rm=TRUE)$p.value

Y_Wind
0.44

But this ain't
for(T in Time) {
for (N in Names) {
x <- noquote(paste(T,"_Hist$",N, sep=""))
y <- noquote(paste(T,"Hor$",N, sep=""))
Name$P_Value[Name$Name == paste(T,"",N, sep="")] <- t.test(x,y,na.rm=TRUE)$p.value
}
}

This is the error that I get :
Error in t.test.default(y, x, na.rm = TRUE) :
nombre d'observations 'x' insuffisant
De plus : Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) : NAs introduits lors de la conversion automatique.

when I print the x's and y's

Print (x) = Y_Hist$Wind
Print (y) = Y_Hor$Wind

So In my mind it's the same thing.

I think my problem is the paste. Furthermore, I do know that a lapply could be better for it, but I ain't got better results. is there better way to paste ? Or to loop (lapply) more efficiently ?

Thank you !!

P.S. I am kind of new.

mfherman · June 19, 2020, 8:19pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

HanOostdijk · June 19, 2020, 9:17pm

I agree with @mfherman that a reprex would have been nice.
But guessing what you might mean, maybe the following will help.

I think you have two sets of data: a '_Hist' and a 'Hor' set
Each set contains a data.frame per timeperiod (indicated by Time )
and each data.frame contains columns (indicates by Names )
And you want to calculate for each timeperiod and all the names the p-value of the t-test for the columns of the two sets.

The code below should work for all timeperiods and names but I will only define two columns for only the Y timeperiod.
In Comb I generate all combinations of your periods and names and use them to generate names of the Hist and Hor columns.
Then I loop over the list with names (I do only two because I did only define two columns).
The only tricky thing is to convert the name of the column to the column data and that is done with
eval(parse(text= ...)
I hope this gives you some ideas.

Y_Hist = data.frame(
  Wind = 1:5,
  Expected = c(1,2,3,2,1)
)

YHor = data.frame(
  Wind = c(1,2,3,2,1),
  Expected = c(5,4,3,2,1)
)

Names <- c("Wind","Expected","Real","Loss","CF_B","CF_N","CF_L")
Time <- c("Y","S", "M", "H","HS", "HM")

Comb <- expand.grid(Names=Names,Time=Time,stringsAsFactors = F)
HistNames <- paste(Comb$Time,"_Hist$",Comb$Names,sep="") 
HorNames <- paste(Comb$Time,"Hor$",Comb$Names,sep="") 
pvalues <- rep(NA,length(HistNames))

# for (i in 1:length(HistNames)) {
for (i in 1:2) {
  histdata = eval(parse(text=HistNames[i]))
  hordata = eval(parse(text=HorNames[i]))
  pvalues[i] = t.test(histdata,hordata,na.rm=TRUE)$p.value
}

^{Created on 2020-06-19 by the reprex package (v0.3.0)}

Hugo_Cordeau · June 22, 2020, 7:47pm

Thanks for the advise, it will help me next time !

Hugo_Cordeau · June 22, 2020, 7:50pm

Exactly !

Further more I had a problem with the split function, it is automatically ordering the datas as factor !

We just have to tidy our datas as.factor to ensure that the DF 3:00 is 3 & not 22: 00 .

Thanks a lot !

system · June 29, 2020, 7:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.