count variables in multidimensional array

kimhjin33 · September 3, 2019, 11:24am

Hi. I have an array with dimensions (720,360,349) where each grid cell has a value. I like to think of it as having 349 'sheets' of 720 by 360 data.

I need to categorize each grid cell into one of 4 categories:
<-2 as apple
-1 to -2 as orange
-1 to 0 as pear
0 to 1 as peach

I would like to use an if else loop to get each grid cell value to be categorized as apple/orange/pear/peach but would the replace function be more suitable? I'm not sure.

After than, I need to count how many apples/orange/pear/peach are in each grid cell and have one array (720,360) for each fruit (4 arrays in total) telling me the count for each grid cell. Obviously, the total number of apples+oranges+pears+peaches should = 349 for each grid cell.

I've tried the count function but I think I'm screwing up the code:

cpeaches<-array(data=NA,dim=c(720,360))
cpeaches=count(extreme[,,1:349],vars="peaches")

Would anyone be kind enough to point me in the right direction?
Thank you!

raytong · September 3, 2019, 12:59pm

Hi kimhjin33. I suggest the script using three "for loops" for your problem.

xDim <- 720
yDim <- 360
zDim <- 349
categories <- c("apple", "orange", "pear", "peach")

x <- array(sample(-3:1, xDim*yDim*zDim, replace = TRUE), c(xDim, yDim, zDim))

x <- ifelse(x < -2, "apple", ifelse(x <= -1, "orange", ifelse(x <= 0, "pear", "peach")))

res <- array(dim = c(xDim, yDim, 4), dimnames = list(1:xDim, 1:yDim, categories))

for (i in seq_len(xDim)) {
  for (j in seq_len(yDim)) {
    for(k in categories) {
      res[i, j, k] <- sum(x[i, j, ] == k)
    }
  }
}

Hope the script can help.

Yarnabrina · September 3, 2019, 2:11pm

@raytong has already provided a solution, but still providing another solution as I didn't quite like nested ifelse:

set.seed(seed = 38899)

fake_data <- array(data = runif(n = (720 * 360 * 349),
                                min = (-3),
                                max = 1),
                   dim = c(720, 360, 349))

coded_data <- array(data = cut(x = fake_data,
                               breaks = c(-Inf, -2, -1, 0, 1),
                               labels = c("apple", "orange", "pear", "peach")),
                    dim = c(720, 360, 349))

count_data <- vapply(X = c("apple", "orange", "pear", "peach"),
                     FUN = function(fruit) apply(X = coded_data,
                                                 MARGIN = 1:2,
                                                 FUN = function(cell) sum(cell == fruit)),
                     FUN.VALUE = array(data = NA_integer_,
                                       dim = c(720, 360)),
                     USE.NAMES = FALSE) # not necessary

Edit

Using @valeri's idea from below, you can do the following to get a 720 * 360 * 4 array, if you don't want to get a list as the final output:

vapply(X = c("apple", "orange", "pear", "peach"),
       FUN = function(fruit) rowSums(x = (coded_data == fruit),
                                     dims = 2),
       FUN.VALUE = array(data = NA_real_,
                         dim = c(720, 360)))

valeri · September 3, 2019, 2:12pm

Or maybe something along these lines... (I modified the "peach" condition to make sure all cases are covered in my example)

set.seed(1)
x <- array(rnorm(n = 3*4*5, mean = 0, sd = 5), dim=c(3,4,5))
y <- array(NA_character_, dim=c(3,4,5))

cond_apple <- x < -2
cond_orange <- x < -1 & x >= -2
cond_pear <- x < 0 & x >= -1
cond_peach <- x >= 0


y[cond_apple] <- 'apple'
y[cond_orange] <- 'orange'
y[cond_pear] <- 'pear'
y[cond_peach] <- 'peach'

purrr::map(list("apple", "orange", "pear", "peach"), function(x) rowSums(y==x, dims = 2))

kimhjin33 · September 4, 2019, 3:15am

Thanks for the reply @Yarnabrina, I'm new to R so am having difficulty understanding what each line of your code does.

What is fake_data? Is that my original data (720,360,349) I'm trying to reconstruct?

And coded_data? Is that a new empty array I need to make before using it?

The breaks that you included (-Inf, -2, -1, 0, 1) are 5 categories but I only have 4 (apple, orange, pear, peach). Does that not make it incompatible?

Finally, what is function fruit?

I also stupidly forgot to mention that my data values range from -Inf to Inf but I only need to assess negative values and categorize them into the 4 categories. Hmm.. which means apple+orange+pear+peach will not equal 349 at the end of the day, as some of the values will be positive and thus, will not be sorted into the 4 categories. Does that change the coding?

Thank you again for all your help.

kimhjin33 · September 4, 2019, 3:20am

Thank you for your reply @valeri

May I assume that the dim=c(3,4,5) you mentioned should be dim=c(720,360,349) for my case?

I also stupidly forgot to mention that my data values range from -Inf to Inf but I only need to assess negative values and categorize them into the 4 categories. Hmm.. which means apple+orange+pear+peach will not equal 349 at the end of the day, as some of the values will be positive and thus, will not be sorted into the 4 categories. Does that change the coding?

Anyhow, I tried running your code and ended up with 4 [720,360,349] arrays (cond_apple, cond_orange, cond_pear, cond_peach). I check the first 'sheet' of data a<-cond_apple[,,1] but it returned a full (720,360) of only TRUE / FALSE cells. Am I doing something wrong?

Thanks so much for the help.

Yarnabrina · September 4, 2019, 4:10am

I try to name my variables as explicit as I could so that it is readable, but seems I have failed horribly. Here's my attempt to explain:

fake_data is a dataset that is fake, and certainly not real. I do not have access your dataset, but to try my code, I had to have a dataset. Hence I generated this dataset. It creates a 3-d array of shape 720 * 360 * 349, same as yours. The observations in the dataset are random observations from the uniform (continuous) distribution over (-3, 1). Please not that this range is not important, and my code afterwards do not assume anything. It should work irrespective of the range of your observations, which you now mention to be (-Inf, Inf). Just substitute your data in place of fake_data.
coded_data is a 3-d array of same shape as the fake_data, which contains the categories. Based on whether the observations in the fake_data are in the intervals (-Inf, -2], (-2, -1], (-1, 0] and (0, 1], it labels them as "apple", "orange", "pear" and "peach". Any observations outside any of these intervals, if exists, will be labelled as NA (not available). breaks = c(-Inf, -2, -1, 0, 1) does not mean that there are 5 categories, it means there are 5 - 1 = 4 categories. These are the boundaries of the latent continuous variable which define your categories. To define k consecutive intervals, you must have k + 1 breakpoints.
function(fruit) is nothing. When you include the apply(...) call afterwards (or, the rowSums(...) call, following what @valeri showed, which is faster) with function(fruit), it creates an anonymous function FUN, which is applied over each element of the X argument. For each element, it returns a 2-d array of dimension 720 * 360, and after it happens for all elements, those are stacked (I guess?) to form a 3-d array of dimension 720 * 360 * 4 as you want. Since you now say that there are positive observations also which you are ignoring, there will some NA in the output of cut. So, you'll have to add na.rm = TRUE to the function call, be it sum inside apply or rowSums directly, so that it ignores the non-available observations while summing, otherwise you'll get NA as the sum.

Modified code with minor changes for observations over the entire real line

# for reproducibility
set.seed(seed = 38899)

# should be your original data
fake_data <- array(data = rnorm(n = (720 * 360 * 349)),
                   dim = c(720, 360, 349))

# observations less than or equal to 1 are labelled to one of the 4 fruits
# higher observations are labelled as NA
coded_data <- array(data = cut(x = fake_data,
                               breaks = c(-Inf, -2, -1, 0, 1),
                               labels = c("apple", "orange", "pear", "peach")),
                    dim = c(720, 360, 349))

# contains the number of each fruit in each cell of the 720 * 360 grid
# in the corresponding level in the 3rd dimension
final_result <- vapply(X = c("apple", "orange", "pear", "peach"),
                       FUN = function(fruit) rowSums(x = (coded_data == fruit),
                                                     na.rm = TRUE,
                                                     dims = 2),
                       FUN.VALUE = array(data = NA_real_,
                                         dim = c(720, 360)))

# print your result
final_result

Hope this explanation helps.

(If it doesn't, I'm sorry, but I've exhausted my English (and R) knowledge. I hope someone else will step in and provide an easier and better explanation. Good luck!)

kimhjin33 · September 4, 2019, 10:13am

Thanks for all the responses @raytong. Really appreciate your help.

Just a couple of questions for the solution you provided @raytong :

Is -3:1... referring to the range of my values (which are -Inf:Inf)?
And am I right to assume that res is just the name you gave the to final output array that I need? Or is it a function?

I tried fashioning my own code because I have the impulsive need to make things difficult for myself. Could you kindly tell me what I'm doing wrong?

mydata[is.na(mydata)] <- "NA" #Not every grid cell has a value. Some of them are NA that I'd like to ignore.

classified<-array(data=NA, dim=c(720,360,349)) #empty array to classify original data into the fruit categories

for (i in 1:720){
  for (j in 1:360){
    for (k in 1:349){
      if (mydata[i,j,k]>1){
        classified[i,j,k]=="irrelevant"
      }
      else{
        if(mydata[i,j,k]>0 & mydata[i,j,k]<=1)
        classified[i,j,k]=="peach"
      }
      else{
        if(mydata[i,j,k]>-1.0 & mydata[i,j,k]<=0)
        classified[i,j,k]=="pear"
      }
      else{
        if(mydata[i,j,k]>-2.0 & mydata[i,j,k]<=-1.0)
        classified[i,j,k]=="orange"
      }
      else{
        if(mydata[i,j,k]<=-2.0)
        classified[i,j,k]=="apple"
      }
      else{
        classified[i,j,k]==NA
      }
    }
  }
}

irrelevant<-array(data=NA,dim=c(720,360)) #empty arrays to count how many of each category there is in each grid cell for 349 sheets
peach<-array(data=NA,dim=c(720,360))
pear<-array(data=NA,dim=c(720,360))
orange<-array(data=NA,dim=c(720,360))
apple<-array(data=NA,dim=c(720,360))

for (i in 1:720){
  for (j in 1:360){
    irrelevant[i,j]<-length(which(classified[i,j,1:349]=="irrelevant"))
    peach[i,j]<-length(which(classified[i,j,1:349]=="peach"))
    pear[i,j]<-length(which(classified[i,j,1:349]=="pear"))
    orange[i,j]<-length(which(classified[i,j,1:349]=="orange"))
    apple[i,j]<-length(which(classified[i,j,1:349]=="apple"))
  }
}

Thanks again!

valeri · September 4, 2019, 11:10am

Hi @kimhjin33,

The answer to your first question would be yes. Whether your conditions exhaust all the values in the data, depends on your specific problem. If it is as you have described it, then you need to decide what you are going to do with the values above 1 - do you want to encode them as NA or Other or some other choice, depending again on what you are trying to do.

When you do something like a<-cond_apple[,,1] then indeed you will get a logical array as cond_apple is a logical array. In my code, I use it do index into the right positions of y (so I only use it as an intermediate step), in the end it is the y variable (or summaries thereof) which you need to compute.

raytong · September 4, 2019, 12:14pm

Hi @kimhjin33. Yes. -3:1 is to mimic your data and res is the result array.

For your code, cannot use mydata[is.na(mydata)] <- "NA" because it will make the array from numeric to character which will affect the following steps, so just skip the code and let the NA as numeric.

In the for loop, add an if statement of is.na to get rid of NA value. All if statement within else cannot trigger the following else statement.

else{
if(mydata[i,j,k]>0 & mydata[i,j,k]<=1)
classified[i,j,k]=="peach"
}
else{...

So, move the if statement immediately after else form an else if statement

for (i in 1:720){
  for (j in 1:360){
    for (k in 1:349){
      if(is.na(mydata[i,j,k])) {
        classified[i,j,k]==NA
      } 
      else if (mydata[i,j,k]>1){
        classified[i,j,k]=="irrelevant"
      }
      else if (mydata[i, j, k] > 0 & mydata[i, j, k] <= 1) {
        classified[i, j, k] == "peach"
      }
      else if (mydata[i, j, k] > -1.0 & mydata[i, j, k] <= 0) {
        classified[i, j, k] == "pear"
      }
      else if (mydata[i, j, k] > -2.0 & mydata[i, j, k] <= -1.0) {
        classified[i, j, k] == "orange"
      }
      else {
        classified[i, j, k] == "apple"
      }
    }
  }
}

And the remaining code should be okay but a little bit complicated.

kimhjin33 · September 5, 2019, 6:31am

Thanks for the clarification @raytong.

My code is "running" but not the way I want it to. The values that are supposed to be peach are turning up to be apple, and apple turning up to be peach. I have some irrelevants for the positive values, and no pears nor oranges whatsoever even though I checked my data and there are many which should fall into the pear or orange category.

Are my logical operators incorrectly defined? R doesn't seem to be able to read my data properly and determine that -1.6 should be pear and not apple.

raytong · September 5, 2019, 9:16am

According to your script, -1.6 should be orange and not pear.

system · September 26, 2019, 9:16am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.