Creating Loops for multiple indices

CKres · March 6, 2022, 8:19pm

I have several indices calculated for my data set (see code below). I want to calculate these indices for each year I have in my data set. (1968-2017, see screenshot). I have heard that I can use "loops" for this, however I am unfamiliar with them and haven't had luck trying them on my own. Any help is greatly appreciated.
I am particularly interested in calculating the indices based on the "expcatchnum" for each "Year". (shown in code)

# Spatial Indices
library(maps)
library(RGeostats)
library(ks)

# Analyzing the spatial distribution of fluke

# calculate spatial indices for each year for expcatchnum
library(readr)
fluke <- read_csv("~/Desktop/fluke.csv")

years <- unique(fluke$Year)
nyr <- length(years)
fyr <- min(years)
indices.tab <- matrix(nrow=nyr,ncol=7)
for (i in 1:nyr) {
  
  #1. Calculate the calendar year
  
  year <- i+fyr-1
  indices.tab[i,1] <- year
  fluke.sub <- fluke[fluke$Year==year & fluke$Lon<10,]
  
  #2. calculate lloyd's index of patchiness (including the zeros)
  
  nbar= mean(fluke.sub$expcatchnum)
  ssq= var(fluke.sub$expcatchnum)
  IoP= 1+ ssq/(nbar^2) - 1/nbar
  indices.tab[i,2] <- IoP
  
  #3. Lorenz curve
  
  n.samp <- length(fluke.sub$expcatchnum)
  n.ord <- order(fluke.sub$expcatchnum)
  n.sort <- fluke.sub$expcatchnum[n.ord]
  n.cum <- cumsum(n.sort)
  
  
  # plot(1:n.samp,n.cum,xlab="Samples",ylab="Cumulative sum")
  # title(main=year)
  # lines(c(0,length(n.cum)),c(0,N.tot))  
  #polygon(c(1:n.samp,0),c(n.cum,0),col="gray")
  
  #4. Calculate the Gini Index
  # 4 in hw 
  i.vec <- c(1:(n.samp-1))
  N.tot <- sum(fluke.sub$expcatchnum)
  gini <- sum(i.vec*(n.samp-i.vec)*(n.sort[-1]-n.sort[1:(n.samp-1)]))/((n.samp-1)*N.tot)
  indices.tab[i,3] <- gini
  
  
  #5. Fit the ellipse to determine centroids 
  # 6 in hw 
  
  aaa<-db.create(fluke.sub[,c('Lon','Lat','expcatchnum')],flag.grid=F,ndim=2,autoname=F)
  
  res.cgi1<-SI.cgi(aaa,flag.ellipse=T, flag.inertia=T,flag.plot=F)
  indices.tab[i,4] <- res.cgi1$center[1]
  indices.tab[i,5] <- res.cgi1$center[2]
  indices.tab[i,6] <- res.cgi1$inertia
  indices.tab[i,7] <- res.cgi1$weight
  
}

indices.df <- data.frame(indices.tab)
names(indices.df) <- c("Year","Lloyds","Gini","cog.lon","cog.lat","Inertia","Weight")

pieterjanvc · March 6, 2022, 9:48pm

Hi there,

I don't think you need to use a loop for this, if you use some of the functions from the dplyr package.

Here is an example

library(dplyr)

set.seed(1) #Only needed for reproducibility

#Generate some data
myData = data.frame(
  year = rep(1968:1970, each = 3),
  val = runif(9) * 10
)
myData
#>   year      val
#> 1 1968 2.655087
#> 2 1968 3.721239
#> 3 1968 5.728534
#> 4 1969 9.082078
#> 5 1969 2.016819
#> 6 1969 8.983897
#> 7 1970 9.446753
#> 8 1970 6.607978
#> 9 1970 6.291140

#Function to calculate gini
gini = function(x){
  sum(abs(sapply(x, function(y) y - x))) / (2 * length(x)^2 * mean(x))
}

#Group the data by year and calculate stats
myData %>% 
  group_by(year) %>% #group per year
  summarise(
    gini = gini(val), #using function
    lloyds = 1 + var(val)/mean(val)^2 - 1/mean(val) #one line formula
  )
#> # A tibble: 3 x 3
#>    year   gini lloyds
#>   <int>  <dbl>  <dbl>
#> 1  1968 0.169   0.902
#> 2  1969 0.235   1.22 
#> 3  1970 0.0941  0.920

^{Created on 2022-03-06 by the reprex package (v2.0.1)}

For more info on the dplyr functions you can look at the dplyr Tidyverse documentation online.

Hope this helps,
PJ

system · March 27, 2022, 9:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.