 # How do I assign a vector to a subset of rows of a dataframe using the tidyverse?

I'm very surprised that there isn't a simple way (or at least, I couldn't find one through the usual channels) to assign a vector to a noncontiguous subset of rows in a dataframe, using `dplyr`. Example:

``````RMSE_table <- data.frame(model = rep(c("exponential", "polynomial"), times = 4),
type = rep(c("cumulative","new"), each = 4),
data = rep(rep(c("cases","deaths"), each = 2), times = 2),
RMSE = numeric(8))

x <- c(1,3)
``````

This (w/o `dplyr`) works, and it's quite readable, even though I could do without the repetitions:

``````> RMSE_table[RMSE_table\$model == "exponential" & RMSE_table\$type == "cumulative", "RMSE"] <- x
> RMSE_table
model       type   data RMSE
1 exponential cumulative  cases    1
2  polynomial cumulative  cases    0
3 exponential cumulative deaths    3
4  polynomial cumulative deaths    0
5 exponential        new  cases    0
6  polynomial        new  cases    0
7 exponential        new deaths    0
8  polynomial        new deaths    0
``````

This doesn't:

``````> library(dplyr)
> library(magrittr)
> RMSE_table %<>%
+ mutate(RMSE = ifelse(model == "exponential" & type == "cumulative", x, RMSE))
> RMSE_table
model       type   data RMSE
1 exponential cumulative  cases    1
2  polynomial cumulative  cases    0
3 exponential cumulative deaths    1
4  polynomial cumulative deaths    0
5 exponential        new  cases    0
6  polynomial        new  cases    0
7 exponential        new deaths    0
8  polynomial        new deaths    0
``````

Am I missing something obvious?

If I understand the question correctly, this is not a problem of `dplyr`.

While you use this: `RMSE_table[RMSE_table\$model == "exponential" & RMSE_table\$type == "cumulative", "RMSE"] <- x`, you are replacing a vector (subset of a larger vector) with another shorter vector `x`, which will be replicated as required to match the length. Since here its length matches exactly the length of the subset, it seems to work perfectly.

However, when you use `mutate(RMSE = ifelse(model == "exponential" & type == "cumulative", x, RMSE))`, you are replacing the existing vector with another one created by `ifelse`. The way `ifelse` works, it has replicated `x` to match the length of `RMSE`. So, in the positions where the evaluation turns out to be `TRUE`, it so happens that both are in odd position and hence the first element of `x` is used at the corresponding positions.

You can use `replace` in this case:

``````> library(dplyr)
> RMSE_table <- tibble(model = rep(x = c("exponential", "polynomial"), times = 4),
+                      type = rep(x = c("cumulative","new"), each = 4),
+                      data = rep(x = c("cases","deaths"), each = 2, times = 2),
+                      RMSE = numeric(length = 8))
> RMSE_table %>%
+     mutate(RMSE = replace(x = RMSE,
+                           list = (model == "exponential" & type == "cumulative"),
+                           values = c(1, 3)))
# A tibble: 8 x 4
model       type       data    RMSE
<chr>       <chr>      <chr>  <dbl>
1 exponential cumulative cases      1
2 polynomial  cumulative cases      0
3 exponential cumulative deaths     3
4 polynomial  cumulative deaths     0
5 exponential new        cases      0
6 polynomial  new        cases      0
7 exponential new        deaths     0
8 polynomial  new        deaths     0
``````

Hope this helps.

1 Like

Hi @Andrea: Here's another possibility:

``````RMSE_table %>%
group_by(model, type) %>%
mutate(
RMSE =
case_when(
model == "exponential" & type == "cumulative" ~ x,
TRUE ~ RMSE
))
``````
2 Likes

This works! But could you explain me why `group_by` is needed? I actually tried `case_when` as a first attempt, but it didn't work by itself. It looks like you need to combine it with `group_by` in order to make this work.

This works flawlessly! So, why does `replace` work here? If I understand correctly, it takes the vector `RMSE`, it finds the indices in `RMSE` corresponding to the content of `list` (which in this case is a logical vector) and then it substitutes the corresponding values in `RMSE` with the values in `values`, recyclying values if needed. Correct?

Thanks for the interesting puzzle, @Andrea!

You can think of each `~` expression in `case_when()` as applying `replace()` to a logical vector:

``````x <- c(1, 3)
bool  <- sample(c(T, F), 8, replace = T)
bool
#>   TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

replace(bool, bool, which(bool))
#>  1 0 3 0 0 6 7 8
case_when(bool ~ 1:8)
#>   1 NA  3 NA NA  6  7  8
``````

with the difference that `case_when()` --- as a 'tidy' function --- requires that the replacement vector be of the same length as `bool` (or of length 1, and then recycled to the length of `bool`).

In the case of `RMSE_table`, since the replacement vector, `x`, is of length 2, the `group_by()` command is necessary so that the vector

``````model == "exponential" & type == "cumulative"
``````

is also of length 2, since each subtable created by `group_by(model, type)` has exactly two rows:

``````library(tidyverse)
RMSE_table <-
data.frame(
model = rep(c("exponential", "polynomial"), times = 4),
type = rep(c("cumulative","new"), each = 4),
data = rep(rep(c("cases","deaths"), each = 2), times = 2),
RMSE = numeric(8)
)

RMSE_table %>%
count(model, type)
#> # A tibble: 4 x 3
#>   model       type           n
#>   <fct>       <fct>      <int>
#> 1 exponential cumulative     2
#> 2 exponential new            2
#> 3 polynomial  cumulative     2
#> 4 polynomial  new            2
``````

@Yarnabrina's `replace()` solution is more general, but doesn't work if the logical vector contains an `NA`:

``````RMSE_table %>%
#>         model       type   data RMSE
#> 1 exponential cumulative  cases    0
#> 2  polynomial cumulative  cases    0
#> 3 exponential cumulative deaths    0
#> 4  polynomial cumulative deaths    0
#> 5 exponential        new  cases    0
#> 6  polynomial        new  cases    0
#> 7 exponential        new deaths    0
#> 8  polynomial        new deaths    0
#> 9        <NA>       <NA>   <NA>    0

RMSE_table %>%
mutate(
RMSE =
replace(x = RMSE,
list = model == "exponential" & type == "cumulative",
values = c(1, 3)
)
)
#> Error in x[list] <- values: NAs are not allowed in subscripted assignments
``````

A more general version of the solution I posted is possible if the issues caused by the length of `x` can be circumvented:

``````RMSE_table %>%
group_by(model, type) %>%
mutate(
RMSE =
case_when(
model == "exponential" & type == "cumulative" ~
x[1:n()],
TRUE ~ RMSE
)
)
#> # A tibble: 8 x 4
#> # Groups:   model, type 
#>   model       type       data    RMSE
#>   <fct>       <fct>      <fct>  <dbl>
#> 1 exponential cumulative cases      1
#> 2 polynomial  cumulative cases      0
#> 3 exponential cumulative deaths     3
#> 4 polynomial  cumulative deaths     0
#> 5 exponential new        cases      0
#> 6 polynomial  new        cases      0
#> 7 exponential new        deaths     0
#> 8 polynomial  new        deaths     0

RMSE_table %>%
group_by(model, type) %>%
mutate(
RMSE =
case_when(
model == "exponential" & type == "cumulative" ~
x[1:n()],
TRUE ~ RMSE
)
)
#> Warning: Factor `model` contains implicit NA, consider using
#> `forcats::fct_explicit_na`
#> # A tibble: 9 x 4
#> # Groups:   model, type 
#>   model       type       data    RMSE
#>   <fct>       <fct>      <fct>  <dbl>
#> 1 exponential cumulative cases      1
#> 2 polynomial  cumulative cases      0
#> 3 exponential cumulative deaths     3
#> 4 polynomial  cumulative deaths     0
#> 5 exponential new        cases      0
#> 6 polynomial  new        cases      0
#> 7 exponential new        deaths     0
#> 8 polynomial  new        deaths     0
#> 9 <NA>        <NA>       <NA>       0
``````

Created on 2020-03-27 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.