Loop over a list & t.test (p.value)

Hi, I have data of two times series & I want to know if they are statistically different. I have like 120 possible combinations and other time series to analyze too, so I have to generate a loop for it.

I would like to :

  1. Name variables (this works, but bad coding)
  2. Have the p.value for each observation. (I only put M (Month), but I would normally have January, May, ect)

1 (ok)

Names <- c("Wind","Expected","Real","Loss","CF_B","CF_N","CF_L")
Time <- c("Y","S", "M", "H","HS", "HM")

rm(Name)
Name <- "H"
for(T in Time) {
for (N in Names) {
Name[N[T]] <- paste(T,"_",N, sep="")
}
}
Name <- as.data.frame(Name)
Name <- subset(Name, Name != "H")
Name$P_Value <- Name$Name

#This seems bad coding, but I don't get why I can't just spawn the vector below without setting it.

2. P.value

This works
Y_Wind <- t.test(Y_Hist$Wind,Y_Hor$Wind,na.rm=TRUE)$p.value

Y_Wind
0.44

But this ain't
for(T in Time) {
for (N in Names) {
x <- noquote(paste(T,"_Hist$",N, sep=""))
y <- noquote(paste(T,"Hor$",N, sep=""))
Name$P_Value[Name$Name == paste(T,"
",N, sep="")] <- t.test(x,y,na.rm=TRUE)$p.value
}
}

This is the error that I get :
Error in t.test.default(y, x, na.rm = TRUE) :
nombre d'observations 'x' insuffisant
De plus : Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) : NAs introduits lors de la conversion automatique.

when I print the x's and y's

Print (x) = Y_Hist$Wind
Print (y) = Y_Hor$Wind

So In my mind it's the same thing.

I think my problem is the paste. Furthermore, I do know that a lapply could be better for it, but I ain't got better results. is there better way to paste ? Or to loop (lapply) more efficiently ?

Thank you !!

P.S. I am kind of new.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

1 Like

I agree with @mfherman that a reprex would have been nice.
But guessing what you might mean, maybe the following will help.

I think you have two sets of data: a '_Hist' and a 'Hor' set
Each set contains a data.frame per timeperiod (indicated by Time )
and each data.frame contains columns (indicates by Names )
And you want to calculate for each timeperiod and all the names the p-value of the t-test for the columns of the two sets.

The code below should work for all timeperiods and names but I will only define two columns for only the Y timeperiod.
In Comb I generate all combinations of your periods and names and use them to generate names of the Hist and Hor columns.
Then I loop over the list with names (I do only two because I did only define two columns).
The only tricky thing is to convert the name of the column to the column data and that is done with
eval(parse(text= ...)
I hope this gives you some ideas.

Y_Hist = data.frame(
  Wind = 1:5,
  Expected = c(1,2,3,2,1)
)

YHor = data.frame(
  Wind = c(1,2,3,2,1),
  Expected = c(5,4,3,2,1)
)

Names <- c("Wind","Expected","Real","Loss","CF_B","CF_N","CF_L")
Time <- c("Y","S", "M", "H","HS", "HM")

Comb <- expand.grid(Names=Names,Time=Time,stringsAsFactors = F)
HistNames <- paste(Comb$Time,"_Hist$",Comb$Names,sep="") 
HorNames <- paste(Comb$Time,"Hor$",Comb$Names,sep="") 
pvalues <- rep(NA,length(HistNames))

# for (i in 1:length(HistNames)) {
for (i in 1:2) {
  histdata = eval(parse(text=HistNames[i]))
  hordata = eval(parse(text=HorNames[i]))
  pvalues[i] = t.test(histdata,hordata,na.rm=TRUE)$p.value
}

Created on 2020-06-19 by the reprex package (v0.3.0)

1 Like

Thanks for the advise, it will help me next time !

Exactly !

Further more I had a problem with the split function, it is automatically ordering the datas as factor !

We just have to tidy our datas as.factor to ensure that the DF 3:00 is 3 & not 22: 00 .

Thanks a lot !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.