Repeat Function on Multiple Variables

Craigdux · May 3, 2021, 7:41pm

I think my question can be simplified: I think this is a loop question. Can anyone direct me to a good summary of this?

Thanks

Hi, I have a dataset with three columns: stations, Year, and avg.

There are 19 stations, and five to six years per Year, and a value for each year.

I am trying to run the Mann Kendall trend test on each of the 19 stations, using the rkt function.

I starting writing the code, copying, then changing the name of the station 19 times. I am sure there is a more efficient way to do this with loop, or lappy? Could someone help me on this? I am still learning this R!

Thank you.

My code: ("Year" has to be numeric for this to run)

library(rkt)

###### 
tn_trend1 <-rkt(date=alachua.tn.wide$Year, 
         y=alachua.tn.wide$BLUCOPK)
#> Error in integer(n): object 'alachua.tn.wide' not found
tn_trend1
#> Error in eval(expr, envir, enclos): object 'tn_trend1' not found

^{Created on 2021-05-03 by the reprex package (v1.0.0)}

My data:--just the first two stations, of five years

A tibble: 105 x 3

Groups: station [18]

station Year avg

1 ALACHCHAN 2014 0.925
2 ALACHCHAN 2015 1.97
3 ALACHCHAN 2016 1.5
4 ALACHCHAN 2017 2.3
5 ALACHCHAN 2018 1.15
6 BLUCOPK 2014 1.5
7 BLUCOPK 2015 1.5
8 BLUCOPK 2016 1.7
9 BLUCOPK 2017 1.95
10 BLUCOPK 2018 1.33

pieterjanvc · May 13, 2021, 12:03pm

Hi,

Here is one solution

library(tidyverse)
library(rkt)

#The data
myData = data.frame(
           station = c("ALACHCHAN","ALACHCHAN",
                       "ALACHCHAN","ALACHCHAN","ALACHCHAN","BLUCOPK","BLUCOPK",
                       "BLUCOPK","BLUCOPK","BLUCOPK"),
              year = c(2014L,2015L,2016L,2017L,
                       2018L,2014L,2015L,2016L,2017L,2018L),
             value = c(0.925, 1.97, 1.5, 2.3, 1.15, 1.5, 1.5, 1.7, 1.95, 1.33)
) 

#Run for ONE station
testData = myData %>% filter(station == "ALACHCHAN")
rkt(testData$year, testData$value)
#> 
#> Standard model
#> Tau = 0.2
#> Score =  2
#> var(Score) =  16.66667
#> 2-sided p-value =  0.8064959
#> Theil-Sen's (MK) or seasonal/regional Kendall (SKT/RKT) slope=  0.110625

#Run for ALL stations
result = map_df(unique(myData$station), function(myStation){
  
  #Filter data
  testData = myData %>% filter(station == myStation)
  #Run test
  testData = rkt(testData$year, testData$value)
  #Transform result into data frame
  unlist(testData) %>% t() %>%  as.data.frame() %>% 
    add_column(station = myStation, .before = T)
  
})

result
#>     station        sl S        B     varS sl.corrected varS.corrected partial.S
#> 1 ALACHCHAN 0.8064959 2 0.110625 16.66667           NA             NA        NA
#> 2   BLUCOPK 1.0000000 1 0.050000 15.66667           NA             NA        NA
#>   partial.sl partial.varS partial.sl.corrected partial.varS.corrected tau
#> 1         NA           NA                   NA                     NA 0.2
#> 2         NA           NA                   NA                     NA 0.1

^{Created on 2021-05-13 by the reprex package (v2.0.0)}

Hope this helps,
PJ

Craigdux · May 13, 2021, 2:23pm

@pieterjanvc Thank you so much! I have many of these analyses to perform, so you saved me huge amount of time!
Best,
Craig

Craigdux · May 13, 2021, 3:54pm

@pieterjanvc I having a bit of an issue. Perhaps it is my dataframe. I used your code, but changed:
"myData" to "alachua.tn2"
"year" to "Year"
"value" to "avg"

Why do you have "L" after your year data? I changed my Year to numeric, since rkt requires this.

I am getting the same value for each station--some of my results below:

station	sl	S	B	varS	sl.corrected	varS.corrected	partial.S	partial.sl	partial.varS	partial.sl.corrected	partial.varS.corrected	tau
ALACHCHAN	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
BLUCOPK	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
BLUOTTER	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
BOULSP	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
CEL441	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
GLENSP	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2
HOG30US	0.806495941	2	0.111458333	16.66666667	NA	NA	NA	NA	NA	NA	NA	0.2

And my code below:

#Run for ALL stations
result = map_df(unique(alachua.tn2$station), function(myStation){
  
  #Filter data
  testData2 = alachua.tn2 %>% filter(station == myStation)
  #Run test
  testData2 = rkt(testData$Year, testData$avg)
  #Transform result into data frame
  unlist(testData2) %>% t() %>%  as.data.frame() %>% 
    add_column(station = myStation, .before = T)
  
})
#> Error in map_df(unique(alachua.tn2$station), function(myStation) {: could not find function "map_df"

^{Created on 2021-05-13 by the reprex package (v1.0.0)}

pieterjanvc · May 13, 2021, 3:58pm

Hi,

This is because you have not loaded Tidyverse. It contains the function map_df, plus the other dpyr functions I used. Just install it like this

install.packages("tidyverse") #run only once
library(tidyverse)

It's quite large package, but it contains a lot of amazing other packages that are extremely helpful. You can check it out here: https://tidyverse.tidyverse.org/

PJ

Craigdux · May 13, 2021, 5:20pm

@pieterjanvc I actually use "tidyverse" quite alot and I had it loaded. I reloaded it, but got the same result of the same results for all stations. I am at a loss as to what happening. Could it be the data?

technocrat · May 13, 2021, 6:02pm

The L signifies typeof integer.

means that {purrr} is not in namespace.

library(purrr)

Craigdux · May 13, 2021, 6:19pm

@technocrat Thanks for the "L" clarification. I followed up on this.

Also, I loaded purrr programing tools but got same result (same answer for all stations).

I must be doing something wrong here?

technocrat · May 13, 2021, 6:23pm

Same error message or is a data frame returned but only with the single station. If it's the latter

is the reason.

pieterjanvc · May 13, 2021, 7:14pm

Hi again,

Do you have the issue when running my reprex example? If not, then the error lies in your conversion from my code to yours, if you do, there's something with your libraries that is messing up the code...

PJ

Craigdux · May 13, 2021, 8:55pm

@pieterjanvc Your code worked great--output was different for each station. Any idea how I trouble shoot this (sorry, I am a relatively new R user, and struggling with this....) Thank you again for your help.

pieterjanvc · May 14, 2021, 12:00pm

What do you mean by that? The whole point is it should be different right? Each station has different data so the result will be different too ...

PJ

Craigdux · May 14, 2021, 4:54pm

@pieterjanvc Yes. I was not clear.

However, when I ran it again, using PURR, I am coming up with this error:

Error in filter(., station == myStation) : object 'station' not found >

result2
station sl S B varS sl.corrected varS.corrected partial.S partial.sl partial.varS partial.sl.corrected
1 ALACHCHAN NA NA NA NA NA NA NA NA NA NA
2 BLUCOPK NA NA NA NA NA NA NA NA NA NA
3 BLUOTTER NA NA NA NA NA NA NA NA NA NA
4 BOULSP NA NA NA NA NA NA NA NA NA NA
5 CEL441 NA NA NA NA NA NA NA NA NA NA
6 GLENSP NA NA NA NA NA NA NA NA NA NA
7 HOG30US NA NA NA NA

pieterjanvc · May 14, 2021, 6:40pm

Hi,

You're really confusing me now
Could you please create a real reprex following the guide so that at least I can work with the actual dataset. You are now just showing me random pieces of data and code which are not helping.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

PJ

system · June 4, 2021, 6:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.