Summarise across rows by making reference to row numbers in R

I have a question about manipulating data frames taking into account row indexes. I have asked this in Stack Exchange, but I haven't had any luck finding a tidyverse solution yet. So I'd very much appreciate your suggestions!

I am trying to summarize my dataframe in a way that makes reference to row numbers, but I can't find a way to do it in a tidy way. My dataframe looks like this.

# Sample data frame.
df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))

  value
1     1
2     2
3     1
4     1
5     2
6     4
7     5
8     3
9     2

I need to create a column, which says TRUE if the corresponding number in value as well as the numbers in next 4 consecutive rows are all larger than or equal to 2. The resulting dataframe should look like this:

  value largerThan
1     1      FALSE
2     2      FALSE
3     1      FALSE
4     1      FALSE
5     2       TRUE
6     4         NA
7     5         NA
8     3         NA
9     2         NA

Note the four NA in the last four rows of largerThan. This is because these these rows don't have 4 consecutive rows after them, so they can't be evaluated. This is what is
tripping me up, together with the fact that I don't know how to make reference to row numbers when using tidyverse syntax. This was more straightforward with for loops, but I can't think of equivalents with the tidyverse functions.

1 Like
df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))

library(slider)

df$largerThan <- slide_lgl(.x = df,
      .after = 4L,
      .f=~{
        .x$value -> x
        ifelse(length(x)==5,
               all(x>=2) ,
               NA)})

df

Thanks for this suggestion! I didn't know about "slide_lgl". I'll wait to see if someone has an alternative that doesn't require installing a new package, but this already helps. Much appreciated!

Would this be something you're looking for?

library(dplyr)

df <- data.frame(value = c(1,2,1,1,2,4,5,3,2))

df %>% 
  mutate(cond_1 = if_else(value > 1, TRUE, FALSE),
         cond_2 = lead(cond_1) + lead(cond_1, 2) + lead(cond_1, 3) + lead(cond_1, 4),
         result = if_else(cond_1 == TRUE & cond_2 == 4, TRUE, FALSE)
         ) %>% 
  ## remove temporary columns
  select(value, result)
#>   value result
#> 1     1  FALSE
#> 2     2  FALSE
#> 3     1  FALSE
#> 4     1  FALSE
#> 5     2   TRUE
#> 6     4     NA
#> 7     5     NA
#> 8     3     NA
#> 9     2     NA

Created on 2021-02-27 by the reprex package (v1.0.0)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.