 # How to alter every row after a certain condition has been met

I am trying to alter a variable in every row that occurs after a specific condition is met as indicated by a second variable. Preferably by using dplyr if possible, but I could not find any function/combination that has allowed me to do so thus far. A simplified version of my data can be generated as follows:

``````N <- c(1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N")
Dat <- as.data.frame(cbind(N, R))
``````

What I need, is for when the 'R' variable indicates a 'Y', that +1 is added to the 'N' variable in that row plus every row that occurs after. Essentially the resulting data frame should look like this:

``````      N        R
1    1        N
2    1        N
3    2        Y
4    2        N
5    2        N
6    2        N
``````

Any help or guidance would be much appreciated.

``````N <- c(1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N")

start <- which(R == "Y")

for (i in seq_along(N)) if (i >= start) N[i] = N[i] + 1
N
#>  1 1 2 2 2 2
``````

Comment: `dplyr` is a useful tool, and it can be made more useful by first addressing the problem with tools in `{base}`, which often provide a more direct solution.

The snippet takes advantage of two aspects of the problem statement:

1. In the `R` vector we have a single unknown—the index position of the first occurrence of "Y"
2. In the `N` vector, we use the index from #1 to change the value of N at that index by adding 1, and continue to do so until the end of `N`.

`which` always applies a logical test and returns an index.

1 Like

Thanks for the response. My apologies, I should have been more specific in how I posed the question. I thought if I kept it simple I would be able to adapt it to my code.

Essentially, the reason I was hoping for a dplyr solution is because I need the loop to also obey a grouping variable.

A better representation of the data is closer to this:

``````Groups <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C")
N <- c(1,1,1,1,1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N", "Y", "N", "N", "N")
Dat <- as.data.frame(cbind(Groups, N, R))

``````

With this as the required solution

``````   Groups N R
1       A 1 N
2       A 1 N
3       A 2 Y
4       A 2 N
5       B 1 N
6       B 1 N
7       B 2 Y
8       B 2 N
9       C 1 N
10      C 1 N
``````

Apologies I should have been more specific from the start.

1 Like

The hardest part of analysis is posing the question clearly. In `R` that usually involves focusing on the what than the how.

Every `R` problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

Here, `x` is the data frame, `D`, composed of three variables, two character and one numeric, where the first of the character variables, `Groups` serves as a grouping variable.

The desired result, `y` is another data frame that differs from `D` only in respect of its `N` variable. Within each group, every element of `N` that has a value of `1` is set to `2` if the corresponding value of `R` is "Y" or if a preceding value of `N` has been so set; otherwise, the value of `N` remains unchanged.

To compose `f`, three functions are applied, as described in the comments.

``````## Data

Groups <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C")
N <- c(1,1,1,1,1,1,1,1,1,1)
R <- c("N", "N", "Y", "N", "N", "N", "Y", "N", "N", "N")
D <- data.frame(Groups, N, R)

# inspect
D
#>    Groups N R
#> 1       A 1 N
#> 2       A 1 N
#> 3       A 1 Y
#> 4       A 1 N
#> 5       B 1 N
#> 6       B 1 N
#> 7       B 1 Y
#> 8       B 1 N
#> 9       C 1 N
#> 10      C 1 N

## Functions

# make a list of data frames by Groups variable
# where x is data frame and y is grouping variable, UN-quoted
get_group   <- function(x,y) split(x,y)

# convert R variable from char to logical
make_lgl <- function(x) ifelse(x == "N", FALSE, TRUE)

# for numeric calculation, FALSE evaluates to 0 and
# TRUE evaluates to 1; therefore, we can convert the
# cumsum of the transformed R var to detect when
# the condition R == "Y" has been met and change the
# value in N from 1 to 2 for the current and all
# subsequent values, x is a data frame, y is it's column
# name (quoted) with the condition to be tested, e.g.,
# flip_N(D,"R"); returns a vector of like length
# that will be used to replace an existing vector
flip_N <- function(x,y) ifelse(cumsum(make_lgl(x[y])) >= 1,2,1)

# Main

# create a list of data frames composed of D split by group
the_groups <- get_group(D,Groups)

# iterate over the_groups, modifying in place

for(i in seq_along(get_group(D,Groups))) {
the_groups[i][][] = flip_N(the_groups[i][],"R")
}

# unsplit the_groups into a single data frame

unsplit(the_groups,Groups)
#>    Groups N R
#> 1       A 1 N
#> 2       A 1 N
#> 3       A 2 Y
#> 4       A 2 N
#> 5       B 1 N
#> 6       B 1 N
#> 7       B 2 Y
#> 8       B 2 N
#> 9       C 1 N
#> 10      C 1 N
``````

The way that comes to mind for a `dplyr` solution is to use `dplyr::group_by` and `tidyr::nest()`, which will produce

`````` D %>% dplyr::group_by(Groups) %>% tidyr::nest()
# A tibble: 3 × 2
# Groups:   Groups 
Groups data
<chr>  <list>
1 A      <tibble [4 × 2]>
2 B      <tibble [4 × 2]>
3 C      <tibble [2 × 2]>
``````

I'll take another look at a `dplyr` solution if no one else posts one.

1 Like

Technocrat invited me to look at this. I think he already identified that cumsum() is the critical feature.
This is quite concise I think.

``````D %>% group_by(Groups) %>%
mutate(N=N+cumsum(R=="Y"))
``````

apologies if I omitted anything of importance, I can try again if you draw my attention to it.

(Sidenote, this approach does assume that R's contents within a group are sticking to a rule where there is not more than one Y event)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.