Adding column based on other column

dplyr
tidyverse
rstudio

#1

Hey all, I am curious as to codes that would be used to add a column, based on a value that is in another column. For example, I want to make a column characterizing samples as "diseased" or "healthy", based on whether or not they have a "D" or "H" in their dataset id. For example, (the commas are supposed to separate)

dataset_id,bacteria,(would like to add a column here)
Site4H,268,healthy
Site4D,479,diseased
SIte8H,345,healthy
Site8D,567,disease

#2

Hey @livjos! You might be interested in some of the functions in the dplyr package:

  • dplyr::recode() can directly translate values of a column (eg. "D" becomes "Diseased") (oops, I thought you had H or D in a separate column), and
  • dplyr::case_when() can create values for a column based on conditions in one or more other columns.

If you have a peak at the documentation links for those functions, they'll show you some great examples of what you want to do. In this case, you might want to use case_when() with the base endsWith() function, which will return TRUE or FALSE depending on whether each row ends with the supplied suffix.

If you get stuck using them, come back with a reproducible example (reprex) and we can help you out more :slight_smile:


#3
d <- wrapr::build_frame(
   "dataset_id", "bacteria" |
   "Site4H"    , 268L       |
   "Site4D"    , 479L       |
   "SIte8H"    , 345L       |
   "Site8D"    , 567L       )

print(d)
#>   dataset_id bacteria
#> 1     Site4H      268
#> 2     Site4D      479
#> 3     SIte8H      345
#> 4     Site8D      567

d$status <- ifelse(endsWith(d$dataset_id, "H"), "healthy", "disease")

print(d)
#>   dataset_id bacteria  status
#> 1     Site4H      268 healthy
#> 2     Site4D      479 disease
#> 3     SIte8H      345 healthy
#> 4     Site8D      567 disease

#4

It should look like that using tidyverse

tab <- tibble::tribble(
  ~dataset_id, ~bacteria,
     "Site4H",      268L,
     "Site4D",      479L,
     "SIte8H",      345L,
     "Site8D",      567L
  )
library(dplyr)
tab %>%
  mutate(state = case_when(
    endsWith(dataset_id, "H") ~ "healthy",
    endsWith(dataset_id, "D") ~ "disease",
    TRUE                      ~ NA_character_
  ))
#> # A tibble: 4 x 3
#>   dataset_id bacteria state  
#>   <chr>         <int> <chr>  
#> 1 Site4H          268 healthy
#> 2 Site4D          479 disease
#> 3 SIte8H          345 healthy
#> 4 Site8D          567 disease

Created on 2018-10-11 by the reprex package (v0.2.1)