Create a column based on conditions met in 2 columns

See the FAQ: How to do a minimal reproducible example reprex for beginners. Being able to cut-and-paste a data frame is an advantage to attracting answers.

Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

In this case, PersonDedupe2 (which I'll refer to as dat) plays the role of x. It's a matrix-like object (a two-dimensional array of all numeric values) that may be of class matrix, data.frame or tibble.

y is the desired output, which I'll refer to as out, which will also be a matrix-like object consisting of 5 variables—one for PersonID (which I'll refer to as id) and one for each of feb, mar, apr and may.

f is the function to be composed that populates out. Let's start with a single variable in dat, February.

The first row of dat illustrates one outcome—the current and prior months are both zero, which is an unprovided for case in the problem description, a subject with no recorded activity in the past or current month. I'll encode that case as NA.

The second row records a subject with non-zero activity in both the current and prior month, so that can be encoded as existing if the activity is equal each month, otherwise changed.

Not shown for February is the case in which the current month is 1 or more and the prior month is 0, so that can be encoded as new.

With that in mind, the following code tests each pair of column vectors and sets the appropriate encoding (user to verify) in a new data frame. The code is hardwired, which is poor practice when programming for a recurring need—object names, column positions and encoding should all be parameterized.

# for reproducibility
set.seed(137)
# assortment of integers
basket <- c(rep(0,50),rep(1:4,15))
# simulated data
dat <- data.frame(id = sample(30000:40000,25),
                  Jan = sample(basket,25),
                  Feb = sample(basket,25),
                  Mar = sample(basket,25),
                  Apr = sample(basket,25),
                  May = sample(basket,25))

# receiver data frame same id value as dat; otherwise all NA
out <- data.frame(id = dat[,1],
                  dummy = rep(NA,25), # convenience for ease of indexing
                  feb = rep(NA,25),
                  mar = rep(NA,25),
                  apr = rep(NA,25),
                  may = rep(NA,25))

# gets row indices
get_row_indices <- function(x) {
  nas =     which(dat[,x] == 0 & dat[,x-1] == 0) 
  news =    which(dat[,x] >  0 & dat[,x-1] == 0) 
  ended =   which(dat[,x] == 0 & dat[,x-1] >  0) 
  changed = which(dat[,x] >  0 & dat[,x-1] >  0)
  return(list(nas,news,ended,changed))
}

# uses row indices to change the out data frame
populate_out <- function(x) {
  idx = get_row_indices(x)
  out[idx[[1]],x] = NA
  out[idx[[2]],x] = "new"
  out[idx[[3]],x] = "exited"
  out[idx[[4]],x] = "changed"
  return(out)
}

# loops over the month columns after January
for (i in 3:6) out = populate_out(i)

# removes dummy column
 
out <- out[,-2]

out
#>       id     feb     mar     apr     may
#> 1  39786  exited     new  exited    <NA>
#> 2  30892 changed  exited     new changed
#> 3  37534    <NA>     new  exited    <NA>
#> 4  31446  exited     new changed  exited
#> 5  37802  exited    <NA>     new changed
#> 6  30220     new changed  exited     new
#> 7  33562     new  exited    <NA>    <NA>
#> 8  38098     new changed changed changed
#> 9  32171    <NA>    <NA>     new changed
#> 10 35609  exited    <NA>    <NA>     new
#> 11 30064 changed changed changed  exited
#> 12 37915     new  exited     new changed
#> 13 31437    <NA>     new changed changed
#> 14 31583  exited     new changed changed
#> 15 38205 changed  exited    <NA>     new
#> 16 33307  exited     new  exited    <NA>
#> 17 33459  exited    <NA>     new  exited
#> 18 38069  exited    <NA>    <NA>     new
#> 19 34731  exited    <NA>     new  exited
#> 20 38152    <NA>     new changed  exited
#> 21 30415 changed  exited     new changed
#> 22 35516 changed changed  exited     new
#> 23 39822 changed changed changed  exited
#> 24 31037 changed changed  exited    <NA>
#> 25 33234    <NA>     new  exited     new
1 Like