Copy and paste rows based on values from multiple columns

bragks · July 30, 2018, 8:01am

I'm preparing a dataset for use with the lme4-package, and I seem to have sinned on one (probably all) of the principles of tidy data.

The dataset looks something like this (actual set has ~ 250 observations):

tribble(
~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA
)

What I need to do is copy every observation where the value of test1 AND test2 is not NA while retaining all other values in the row and inserting them into the same data frame. Basically, copy and paste all rows where the values of two variables are equal to !is.na. I've had a look in R4DS but I can't seem to find a suitable solution for this particular type of mess, although it might just be that I haven't figured out exactly how spread() and gather() works...

mishabalyasin · July 30, 2018, 8:39am

Not gonna lie, I'm a bit confused by your explanation of what you want to achieve. It would help if next time (or even this time, if I didn't get what you wanted) you write down the expected result similar to how you've written down the input.
That being said, does this do what you want?

library(tidyverse)

df <- tribble(
  ~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA
)

df %>%
  filter_at(vars(starts_with("test")), all_vars(!is.na(.)))
#> # A tibble: 1 x 4
#>      ID test1   randomvar test2  
#>   <dbl> <chr>       <dbl> <chr>  
#> 1     2 result1         1 result2

Created on 2018-07-30 by the reprex package (v0.2.0).

bragks · July 30, 2018, 9:15am

Sorry, don't know why I didn't just make a tribble of what I want as well...

I want to turn this:

tribble(
~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA,
  4, "result1", 8,  "result2"
)

into this:

tribble(
~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA,
  4, "result1", 8,  "result2",
  4, "result1", 8,  "result2"
)

result1 and result2 were just ment as placeholders. The values will vary so I guess I will have to select based on the presence(not NA) in both of them.

Did this make any more sense?

mishabalyasin · July 30, 2018, 9:35am

My first approach is exactly what you need then . The only added wrinkle is that you need to add result to your original dataset. This can be done like this:

library(tidyverse)

df <- tribble(
  ~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA,
  4, "result1", 8,  "result2"
)

res <- tribble(
  ~ID, ~test1, ~randomvar,  ~test2,
  1, "result1", 2,  NA,
  2, "result1", 1,  "result2",
  2, "result1", 1,  "result2",
  3, "result1", 4,  NA,
  4, "result1", 8,  "result2",
  4, "result1", 8,  "result2"
)

res2 <- bind_rows(df, 
          df %>%
            filter_at(vars(starts_with("test")), all_vars(!is.na(.))))

all_equal(res, res2)
#> [1] TRUE

Created on 2018-07-30 by the reprex package (v0.2.0).

bragks · July 30, 2018, 10:46am

Excellent, thank you! I should have mentioned that the actual variable names are not test1 and test2. However I (think I) managed to solve it with:

test <- bind_rows(df,
                 df %>% 
                   filter_at(vars(one_of("img_date", "b_date")), all_vars(!is.na(.))))

Would you agree this does the same given the names img_date and b_date? I tried checking with all_equal() and it says they're equal, guess I'm just worried I'll mess something up...

mishabalyasin · July 30, 2018, 11:02am

You can also use ends_with("date") if it makes sense on your data, but overall it looks like it should give you the result you want.
If all_equal returns TRUE then they are equal. You can always add couple more tests to check validity, of course, but this function tends to be fairly reliable.