So, now that I'm at a computer I had a chance to look through this again.

## Pseudocode

What I *think* you want to do is this:

```
Starting from a data.frame object: df.
Choose a subset of the variables of df: cols
For each col in cols
mutate col to be a new indicator variable coded as 1 or 0
augment df with this new variable
Extract from the augmented df object a particular column (var) as well
as the newly created indicator variables, but only those rows for which
all the indicators have been evaluated to 1.
```

If that is not correct, let me know and please attempt to clarify for us.

## New Data

I've made some new sample data (yours was much to complicated to be able to quickly ingest). I also included some `NA`

data which will force us to think about how we might handle that later.

```
set.seed(123)
n <- 6
df <- data.frame(id = letters[seq(n)],
x1 = sample(c(NA, "n", "y"), n, TRUE, c(0.3, 0.2, 0.5)),
x2 = runif(n))
df
#> id x1 x2
#> 1 a y 0.5281055
#> 2 b <NA> 0.8924190
#> 3 c y 0.5514350
#> 4 d n 0.4566147
#> 5 e n 0.9568333
#> 6 f y 0.4533342
```

## Solution 1

### Mutate

```
df[["m1"]] <- ifelse(df[["x1"]] == "y", 1, 0)
df[["m2"]] <- ifelse(df[["x2"]] >= 0.4, 1, 0)
df
#> id x1 x2 m1 m2
#> 1 a y 0.5281055 1 1
#> 2 b <NA> 0.8924190 NA 1
#> 3 c y 0.5514350 1 1
#> 4 d n 0.4566147 0 1
#> 5 e n 0.9568333 0 1
#> 6 f y 0.4533342 1 1
```

### Function

Here (and for the rest of this reply, I am choosing to pass the *names* of the grouping variables instead of a list of their values. The main reason for this is we are already passing these in through the augmented data.frame, there's no need to increase the volume of data being passed between functions.

Also, I renamed `grouping_var`

to `gv`

because I am a bad and lazy typist and it's a fairly long variable name and it made some of my lines of code longer than I would prefer and I am particular like that.

```
f <- function(dat, var, gv, total_var) {
# I am still not sure what total_var is supposed to do. Is it a threshold value
# that is if we had, say 7 indicators and total_var = 5, would we keep the
# rows which met the conditions of 5 or more indicators? Let me know and
# I can adjust the functions accordingly.
dat[apply(dat[gv], 1, function(x) all(x == 1)), c(var, gv)]
}
```

### Results

```
f(df, "id", c("m1", "m2"))
#> id m1 m2
#> 1 a 1 1
#> NA <NA> NA NA
#> 3 c 1 1
#> 6 f 1 1
```

## Solution 2 - Coding to TRUE/FALSE Instead of 1/0

Our function can be slightly simplified if we mutate to TRUE and FALSE (the logical equivalents of 1 and 0).

### Mutate

```
df[["m1"]] <- df[["x1"]] == "y"
df[["m2"]] <- df[["x2"]] >= 0.4
```

### Function

```
f <- function(dat, var, gv, total_var) {
dat[apply(dat[gv], 1, all), c(var, gv)]
}
```

### Results

```
f(df, "id", c("m1", "m2"))
#> id m1 m2
#> 1 a TRUE TRUE
#> NA <NA> NA NA
#> 3 c TRUE TRUE
#> 6 f TRUE TRUE
```

## Solution 3 - Improvements

Alternately, we can devise a new function which will mutate, subset, and select for you. We use a default condition `"== 1`

, which will match any mutated dummy vars you create as `1`

/`0`

or `TRUE`

/`FALSE`

### Function

```
f <- function(dat, vars, gv, conditions = "== 1", na.rm = FALSE) {
exprs <- mapply(function(a, b) {
parse(text = paste(a, b))
},
gv, conditions,
USE.NAMES = FALSE)
idx <- Reduce(`&`, lapply(exprs, eval, dat)) # 2
idx <- ifelse(is.na(idx), !na.rm, idx) # 3
dat[idx, c(vars, gv)] # 4
}
```

##
Explanation of Function Details

I'm quite happy with this function, so I am going to take a moment to explain it. The comments #1, #2, ... etc correspond to the points below,

- We are pasting the names of the variables together with a condition they must meet and turning the resulting character vector into an expression. for instance, one expression might be
`m1 == 1`

.
- For each
`expr`

in `exprs`

we are evaluating it in a data.frame environment. This is somewhat similar to attaching a data.frame to the global environment or using the `with()`

function if you are familiar with those ideas. Since we're already using `eval`

though to evaluate the expression, it makes more sense to do it in a data.frame environment. Then we reduce the list output to a single index vector.
- If we want to remove
`NA`

's at this point we do so here.
- Finally, we subset our data.frame according to the conditions we set.

### Results

```
f(df, "id", c("m1", "m2"))
#> id m1 m2
#> 1 a TRUE TRUE
#> 2 b NA TRUE
#> 3 c TRUE TRUE
#> 6 f TRUE TRUE
```

```
f(df, "id", c("m1", "m2"), na.rm = TRUE)
#> id m1 m2
#> 1 a TRUE TRUE
#> 3 c TRUE TRUE
#> 6 f TRUE TRUE
```

```
f(df, "id", c("x1", "x2"), c("== \"y\"", ">= 0.4"))
#> id x1 x2
#> 1 a y 0.5281055
#> 2 b <NA> 0.8924190
#> 3 c y 0.5514350
#> 6 f y 0.4533342
```

```
f(df, "id", c("x1", "x2"), c("== \"y\"", ">= 0.4"), TRUE)
#> id x1 x2
#> 1 a y 0.5281055
#> 3 c y 0.5514350
#> 6 f y 0.4533342
```

^{Created on 2020-09-04 by the reprex package (v0.3.0)}

## Solution 4 - A Guess About `total_var`

Here I implement my best guess about what you wanted to use the `total_var`

argument for.

### New Data

```
set.seed(2357)
n <- 10
df <- data.frame(id = letters[seq(n)],
x1 = sample(c(NA, "n", "y"), n, TRUE, c(0.3, 0.2, 0.5)),
x2 = round(runif(n), 2),
x3 = round(rnorm(n), 2))
```

```
df
#> id x1 x2 x3
#> 1 a y 0.42 -0.30
#> 2 b <NA> 0.64 -0.01
#> 3 c y 0.34 1.76
#> 4 d y 0.60 -0.55
#> 5 e y 0.28 -1.26
#> 6 f <NA> 0.56 1.19
#> 7 g y 0.98 -0.21
#> 8 h y 0.31 -1.76
#> 9 i <NA> 0.91 -1.33
#> 10 j n 0.51 0.42
```

### Mutate

```
df[["m1"]] <- ifelse(df[["x1"]] == "y", 1, 0)
df[["m2"]] <- ifelse(df[["x2"]] >= 0.3, 1, 0)
df[["m3"]] <- ifelse(df[["x3"]] < 0, 1, 0)
df
#> id x1 x2 x3 m1 m2 m3
#> 1 a y 0.42 -0.30 1 1 1
#> 2 b <NA> 0.64 -0.01 NA 1 1
#> 3 c y 0.34 1.76 1 1 0
#> 4 d y 0.60 -0.55 1 1 1
#> 5 e y 0.28 -1.26 1 0 1
#> 6 f <NA> 0.56 1.19 NA 1 0
#> 7 g y 0.98 -0.21 1 1 1
#> 8 h y 0.31 -1.76 1 1 1
#> 9 i <NA> 0.91 -1.33 NA 1 1
#> 10 j n 0.51 0.42 0 1 0
```

### New Function

```
f <- function(dat, vars, gv, conditions = "== 1", min_tot = 1, na.rm = FALSE) {
exprs <- mapply(function(a, b) {
parse(text = paste(a, b))
},
gv, conditions,
USE.NAMES = FALSE)
idx <- rowSums(vapply(exprs,
eval,
logical(nrow(dat)), dat),
na.rm = TRUE) >= min_tot
dat[complete.cases(dat) + !na.rm & idx, c(vars, gv)]
}
```

### Results

```
f(df, "id", c("m1", "m2", "m3"))
#> id m1 m2 m3
#> 1 a 1 1 1
#> 2 b NA 1 1
#> 3 c 1 1 0
#> 4 d 1 1 1
#> 5 e 1 0 1
#> 6 f NA 1 0
#> 7 g 1 1 1
#> 8 h 1 1 1
#> 9 i NA 1 1
#> 10 j 0 1 0
```

```
f(df, "id", c("m1", "m2", "m3"), min_tot = 2)
#> id m1 m2 m3
#> 1 a 1 1 1
#> 2 b NA 1 1
#> 3 c 1 1 0
#> 4 d 1 1 1
#> 5 e 1 0 1
#> 7 g 1 1 1
#> 8 h 1 1 1
#> 9 i NA 1 1
```

```
f(df, "id", c("m1", "m2", "m3"), min_tot = 2, na.rm = TRUE)
#> id m1 m2 m3
#> 1 a 1 1 1
#> 3 c 1 1 0
#> 4 d 1 1 1
#> 5 e 1 0 1
#> 7 g 1 1 1
#> 8 h 1 1 1
```

```
f(df, "id", c("m1", "m2", "m3"), min_tot = 3)
#> id m1 m2 m3
#> 1 a 1 1 1
#> 4 d 1 1 1
#> 7 g 1 1 1
#> 8 h 1 1 1
```

^{Created on 2020-09-04 by the reprex package (v0.3.0)}