Repeat Function on Multiple Variables

I think my question can be simplified: I think this is a loop question. Can anyone direct me to a good summary of this?

Thanks

Hi, I have a dataset with three columns: stations, Year, and avg.

There are 19 stations, and five to six years per Year, and a value for each year.

I am trying to run the Mann Kendall trend test on each of the 19 stations, using the rkt function.

I starting writing the code, copying, then changing the name of the station 19 times. I am sure there is a more efficient way to do this with loop, or lappy? Could someone help me on this? I am still learning this R!

Thank you.

My code: ("Year" has to be numeric for this to run)

library(rkt)

###### 
tn_trend1 <-rkt(date=alachua.tn.wide$Year, 
         y=alachua.tn.wide$BLUCOPK)
#> Error in integer(n): object 'alachua.tn.wide' not found
tn_trend1
#> Error in eval(expr, envir, enclos): object 'tn_trend1' not found

Created on 2021-05-03 by the reprex package (v1.0.0)

My data:--just the first two stations, of five years

A tibble: 105 x 3

Groups: station [18]

station Year avg

1 ALACHCHAN 2014 0.925
2 ALACHCHAN 2015 1.97
3 ALACHCHAN 2016 1.5
4 ALACHCHAN 2017 2.3
5 ALACHCHAN 2018 1.15
6 BLUCOPK 2014 1.5
7 BLUCOPK 2015 1.5
8 BLUCOPK 2016 1.7
9 BLUCOPK 2017 1.95
10 BLUCOPK 2018 1.33

Hi,

Here is one solution

library(tidyverse)
library(rkt)

#The data
myData = data.frame(
           station = c("ALACHCHAN","ALACHCHAN",
                       "ALACHCHAN","ALACHCHAN","ALACHCHAN","BLUCOPK","BLUCOPK",
                       "BLUCOPK","BLUCOPK","BLUCOPK"),
              year = c(2014L,2015L,2016L,2017L,
                       2018L,2014L,2015L,2016L,2017L,2018L),
             value = c(0.925, 1.97, 1.5, 2.3, 1.15, 1.5, 1.5, 1.7, 1.95, 1.33)
) 

#Run for ONE station
testData = myData %>% filter(station == "ALACHCHAN")
rkt(testData$year, testData$value)
#> 
#> Standard model
#> Tau = 0.2
#> Score =  2
#> var(Score) =  16.66667
#> 2-sided p-value =  0.8064959
#> Theil-Sen's (MK) or seasonal/regional Kendall (SKT/RKT) slope=  0.110625

#Run for ALL stations
result = map_df(unique(myData$station), function(myStation){
  
  #Filter data
  testData = myData %>% filter(station == myStation)
  #Run test
  testData = rkt(testData$year, testData$value)
  #Transform result into data frame
  unlist(testData) %>% t() %>%  as.data.frame() %>% 
    add_column(station = myStation, .before = T)
  
})

result
#>     station        sl S        B     varS sl.corrected varS.corrected partial.S
#> 1 ALACHCHAN 0.8064959 2 0.110625 16.66667           NA             NA        NA
#> 2   BLUCOPK 1.0000000 1 0.050000 15.66667           NA             NA        NA
#>   partial.sl partial.varS partial.sl.corrected partial.varS.corrected tau
#> 1         NA           NA                   NA                     NA 0.2
#> 2         NA           NA                   NA                     NA 0.1

Created on 2021-05-13 by the reprex package (v2.0.0)

Hope this helps,
PJ

@pieterjanvc Thank you so much! I have many of these analyses to perform, so you saved me huge amount of time!
Best,
Craig

1 Like

@pieterjanvc I having a bit of an issue. Perhaps it is my dataframe. I used your code, but changed:
"myData" to "alachua.tn2"
"year" to "Year"
"value" to "avg"

Why do you have "L" after your year data? I changed my Year to numeric, since rkt requires this.

I am getting the same value for each station--some of my results below:

station sl S B varS sl.corrected varS.corrected partial.S partial.sl partial.varS partial.sl.corrected partial.varS.corrected tau
ALACHCHAN 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
BLUCOPK 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
BLUOTTER 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
BOULSP 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
CEL441 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
GLENSP 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2
HOG30US 0.806495941 2 0.111458333 16.66666667 NA NA NA NA NA NA NA 0.2

And my code below:

#Run for ALL stations
result = map_df(unique(alachua.tn2$station), function(myStation){
  
  #Filter data
  testData2 = alachua.tn2 %>% filter(station == myStation)
  #Run test
  testData2 = rkt(testData$Year, testData$avg)
  #Transform result into data frame
  unlist(testData2) %>% t() %>%  as.data.frame() %>% 
    add_column(station = myStation, .before = T)
  
})
#> Error in map_df(unique(alachua.tn2$station), function(myStation) {: could not find function "map_df"

Created on 2021-05-13 by the reprex package (v1.0.0)

Hi,

This is because you have not loaded Tidyverse. It contains the function map_df, plus the other dpyr functions I used. Just install it like this

install.packages("tidyverse") #run only once
library(tidyverse)

It's quite large package, but it contains a lot of amazing other packages that are extremely helpful. You can check it out here: https://tidyverse.tidyverse.org/

PJ

@pieterjanvc I actually use "tidyverse" quite alot and I had it loaded. I reloaded it, but got the same result of the same results for all stations. I am at a loss as to what happening. Could it be the data?

The L signifies typeof integer.

means that {purrr} is not in namespace.

library(purrr)

@technocrat Thanks for the "L" clarification. I followed up on this.

Also, I loaded purrr programing tools but got same result (same answer for all stations).

I must be doing something wrong here?

Same error message or is a data frame returned but only with the single station. If it's the latter

is the reason.

Hi again,

Do you have the issue when running my reprex example? If not, then the error lies in your conversion from my code to yours, if you do, there's something with your libraries that is messing up the code...

PJ

@pieterjanvc Your code worked great--output was different for each station. Any idea how I trouble shoot this (sorry, I am a relatively new R user, and struggling with this....) Thank you again for your help.

What do you mean by that? The whole point is it should be different right? Each station has different data so the result will be different too ...

PJ

@pieterjanvc Yes. I was not clear.

However, when I ran it again, using PURR, I am coming up with this error:

Error in filter(., station == myStation) : object 'station' not found >

result2
station sl S B varS sl.corrected varS.corrected partial.S partial.sl partial.varS partial.sl.corrected
1 ALACHCHAN NA NA NA NA NA NA NA NA NA NA
2 BLUCOPK NA NA NA NA NA NA NA NA NA NA
3 BLUOTTER NA NA NA NA NA NA NA NA NA NA
4 BOULSP NA NA NA NA NA NA NA NA NA NA
5 CEL441 NA NA NA NA NA NA NA NA NA NA
6 GLENSP NA NA NA NA NA NA NA NA NA NA
7 HOG30US NA NA NA NA

Hi,

You're really confusing me now :slight_smile:
Could you please create a real reprex following the guide so that at least I can work with the actual dataset. You are now just showing me random pieces of data and code which are not helping.

PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.