Get a for loop to actually work, just once!

Cherson · May 7, 2023, 9:13pm

Hi!

So here is the deal. I have never been able to get a for loop to run. I have tried a couple of times but given up, mostly because it hasnt been necessary. But this time I really need to understand how to make it work, as otherwise my script will become too long. I have gone through loads of pages that shows examples, but no example that solves my case.

A classical example would be:

for (x in 1:5) {
    print(x)
}

And this is exactly what I am trying to achieve, but instead for print, I want the following:

for (i in 1:15) {
    grangertest(COL ~ ALL[, i],  order = 3,  data = ALL)
}

That is, I have a time series object called ALL, which has 15 time series representing 15 columns, and I want to perform a grangertest on all series. Simple as that. But I dont understand why my loop doesn't work. I get the following error message:

"Error in waldtest.lm(fm, 2, ...) :
there are aliased coefficients in the model"

Isn't this a multicollinearity thing? Why doesn't it just loop over each column one at a time?

If I do it for one column at a time, it works properly:

    grangertest(COL ~ ALL[, i],  order = 3,  data = ALL)

Any help appreciated!

technocrat · May 7, 2023, 11:21pm

There are two issues here. I'll come back after looking into the there are aliased coefficients in the model after seeing if my guess is right.*

The more common, and perplexing, mystery is Escape from Alcatraz.

After all, it works in Python. It works in C. It works all over the place. The print() seems to work, so what's the deal with loops that seem to do nothing or only spit out the last iteration.

The answer lies in R environments and their private namespaces. There's a two-way communication process that has to happen to get the namespaces on the same page.

Let's say that we have a function f(x) that is supposed to return y as a value. Just like school algebra—f(x)=y. Maybe it's f(x) x^2 returning 4 so that we'd do

y <- f(2)

If we'd just said f() all we'd get is an error

``` r
f <- function(x) x^2
f()
#> Error in f(): argument "x" is missing, with no default

^{Created on 2023-05-07 with reprex v2.0.2}

The parameter for f() is missing—it has nothing from the .Global (outer) environment.

Usually, we don't do that, at least on purpose. What happens, instead is often

f(2)
# 4

which is all very well and good, but evanescent. Where do we find it? It's not in the .Global environment because it hasn't been saved. Leading to

y <- f(2)

Now we can find the square of 2 by invoking y. Easy.

Next, let's tinker by giving the return a name within the function's environment, .Local because that's what we often do in juggling intermediate results.

f <- function(x) y = x^2

but we'd get the same thing with function(x) z = y^2

Whatever we assign to capture the return is going to have that name, not the internal name.

Now, put f in a for() loop. And throw in a print()just to make sure it's doing what is should.

y = for(i in seq_along(1:3)) print(f(i))
#[1] 1
#[1] 4
#[1] 9
y
# NULL

Well, the numbers are right, but what the NULL? Blame .Local. print()has its own, and the only thing that escapes, and intof() .Local` and the screen.

There are more ways to get it wrong, but let's get it right.

f <- function(x) y = x^2
some_object <- c(1,2,3)
receiver<- vector(length = length(some_object))
for (i in seq_along(some_object)) {
  receiver[i] <- f(some_object[i])
}
receiver
#> [1] 1 4 9

The secret sauce is to create an object in .Global and capture the return of f() on each trip through the loop.

technocrat · May 8, 2023, 12:19am

From help(grangertest) in {lmtest}

library(lmtest)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
## Which came first: the chicken or the egg?
data(ChickEgg)
ChickEgg
#> Time Series:
#> Start = 1930 
#> End = 1983 
#> Frequency = 1 
#>      chicken  egg
#> 1930  468491 3581
#> 1931  449743 3532
#> 1932  436815 3327
#> 1933  444523 3255
#> 1934  433937 3156
#> 1935  389958 3081
#> 1936  403446 3166
#> 1937  423921 3443
#> 1938  389624 3424
#> 1939  418591 3561
#> 1940  438288 3640
#> 1941  422841 3840
#> 1942  476935 4456
#> 1943  542047 5000
#> 1944  582197 5366
#> 1945  516497 5154
#> 1946  523227 5130
#> 1947  467217 5077
#> 1948  499644 5032
#> 1949  430876 5148
#> 1950  456549 5404
#> 1951  430988 5322
#> 1952  426555 5323
#> 1953  398156 5307
#> 1954  396776 5402
#> 1955  390708 5407
#> 1956  383690 5500
#> 1957  391363 5442
#> 1958  374281 5442
#> 1959  387002 5542
#> 1960  369484 5339
#> 1961  366082 5358
#> 1962  377392 5403
#> 1963  375575 5345
#> 1964  382262 5435
#> 1965  394118 5474
#> 1966  393019 5540
#> 1967  428746 5836
#> 1968  425158 5777
#> 1969  422096 5629
#> 1970  433280 5704
#> 1971  421763 5806
#> 1972  404191 5742
#> 1973  408769 5502
#> 1974  394101 5461
#> 1975  379754 5382
#> 1976  378361 5377
#> 1977  386518 5408
#> 1978  396933 5608
#> 1979  400585 5777
#> 1980  392110 5825
#> 1981  384838 5625
#> 1982  378609 5800
#> 1983  364584 5656
r <- grangertest(egg ~ chicken, order = 3, data = ChickEgg)
r
#> Granger causality test
#> 
#> Model 1: egg ~ Lags(egg, 1:3) + Lags(chicken, 1:3)
#> Model 2: egg ~ Lags(egg, 1:3)
#>   Res.Df Df      F Pr(>F)
#> 1     44                 
#> 2     47 -3 0.5916 0.6238

This is how grangertest is supposed to work—it takes a time series and columns within it as an argument and returns a value of class() anova data frame.

You can either use a bivariant series like ChickEggs as its principal argument or two univariant observations. In the example above, let's assume we are doing that in a for loop from 1:15. Looking at egg[,i] what happens. We just need to look at egg[,i]

First, i = 1 , and second, i = 2 work, but i =3 is straight out.

library(lmtest)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
data(ChickEgg)
ChickEgg[,1]
#> Time Series:
#> Start = 1930 
#> End = 1983 
#> Frequency = 1 
#>  [1] 468491 449743 436815 444523 433937 389958 403446 423921 389624 418591
#> [11] 438288 422841 476935 542047 582197 516497 523227 467217 499644 430876
#> [21] 456549 430988 426555 398156 396776 390708 383690 391363 374281 387002
#> [31] 369484 366082 377392 375575 382262 394118 393019 428746 425158 422096
#> [41] 433280 421763 404191 408769 394101 379754 378361 386518 396933 400585
#> [51] 392110 384838 378609 364584
ChickEgg[,2]
#> Time Series:
#> Start = 1930 
#> End = 1983 
#> Frequency = 1 
#>  [1] 3581 3532 3327 3255 3156 3081 3166 3443 3424 3561 3640 3840 4456 5000 5366
#> [16] 5154 5130 5077 5032 5148 5404 5322 5323 5307 5402 5407 5500 5442 5442 5542
#> [31] 5339 5358 5403 5345 5435 5474 5540 5836 5777 5629 5704 5806 5742 5502 5461
#> [46] 5382 5377 5408 5608 5777 5825 5625 5800 5656
ChickEgg[,3]
#> Error in `[.default`(ChickEgg, , 3): subscript out of bounds

The OP example cannot have gone beyond $i = 2$, or that would have been the message, so the error message seen must have something to do with the relationship between COLandALL, which I can't comment on without a reprex. [See the FAQ: How to do a minimal reproducible example reprex` for beginners](FAQ: How to do a minimal reproducible example ( reprex ) for beginners)

Cherson · May 8, 2023, 11:19am

Hi technograt,

Thanks for taking of your time to answer my question!
And that is a nice expose you wrote. I understand about half of it. I was actually able to realise what I was doing wrong here, which is something I should have realised instantly but somehow didnt.

COL is one of the 15 columns in ALL, and that is of course why I got the "alised coefficients" error. It is actually the last one, so when I changed to 1:14 instead of 1:15 the issue disappeared.

However, I still dont get any output from my loop. I get nothing. Not even an error message. I figure it must have to do with what you write about a "receiver". So I tried to copy your work, but of course ended up with a some error. I tried:

Granger <- vector(length = 14) #For 14 tests.
...
Granger[i] <- grangertest(COL ~ ALL[, i], order = 3, data = ALL)

I also tried:

length = grangertest(COL ~ ALL[, 1], order = 3, data = ALL)

But here is the real issue. When you give something that doesn't exactly correspond to my problem, I will have to make some adjustment and that is where it breaks. I dont know what adjustment to make. I just dont understand whats going on in the background. A lot of thoughts arises:

Does this go wrong because the grangertest doesnt only output something with a "length" but also with a "width"? That is, do I have to specify the number of rows and columns for the receiver?
Does that have to match the total number of columns and rows the loop is spitting out? It shouldnt because it spits out one vector for each test.
How many brackets must I involve here? Ordinary brackets or squared ones too? Maybe a squared one inside of an ordinary one, just as when I specify the column for the test.
Where do I find information on how to solve the problem? Is there something in the structure of the test output, using str(), that can help me realise how to form the loop?

Obviously, I can read the manual for looping in R, but there will be so much jargon and other discrepancies in the examples that I still wont get it right. For example, I recently realised that in all help sections on any function, they often write out every alternative for some argument:

f(... type = c("EG", "BG", "AIC", "BIC") )

This led to a lot of frustration as I first thought every alternative should be stated every time. But ok, this is simply a way to show the available alternatives.

Back to my problem. How do I get forward?

Thanks again!

MarekGierlinski · May 9, 2023, 12:39pm

grangertest returns an object of class anova and data.frame. You can accumulate these results in a list:

granger <- list()
for(i in 1:14) {
  granger[[i]] <- grangertest(COL ~ ALL[, i], order = 3, data = ALL)
}

and access the results as

granger[[1]]
granger[[2]]
...

Alternatively you can combine the data frames:

library(tibble)
library(purrr)

map(1:14, function(i) {
  grangertest(COL ~ ALL[, i], order = 3, data = ALL) |>
  as_tibble() |>
  add_column(iter = i)
}) |>
  list_rbind()

This will create one tibble with all your results. I added a column iter to keep the value of index i.

Cherson · May 9, 2023, 1:18pm

Hi Marek!

Thank you very much for giving me a neat solution. I did find another one. However, your solution is better so I will use it instead.

kpss_results <- c()
adf_results <- c()

ts_names <- c("column1, "column2",...)


for (x in ts_names) {
  kpss_results <- c(kpss_results, 
                    ndiffs(eval(parse(text = x)), # Converts to machine
                           alpha = 0.05, # readable text, then evaluates
                           test = "kpss")) # x
  
  adf_results <- c(adf_results, 
                   ndiffs(eval(parse(text = x)), 
                          alpha = 0.05, 
                          test = "adf"))
  
}

system · May 16, 2023, 1:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.