Combining Two Rows using the Tidyverse

I am working with a dataset of client visits and am having a trouble concerning missing data. I'm trying to get unique clients but am getting multiple rows returned for some clients. The issue I'm having trouble with uses data like the following:

Client        Gender      Race
   A             M        White
   A             NA       White
   B             F        African American
   B             F        NA

How would I write over the NA based on client code?

I'm not sure if I totally understand your desired output, but maybe it's something like this? For each client, fill NAs in the Gender and Race columns with the values above and then get distinct rows.

library(tidyverse)

df <- tribble(
  ~Client, ~Gender, ~Race,
  "A", "M", "White",
  "A", NA , "White",
  "B", "F", "African American",
  "B", "F", NA
)


df %>% 
  group_by(Client) %>% 
  fill(Gender, Race) %>% 
  distinct()
#> # A tibble: 2 x 3
#> # Groups:   Client [2]
#>   Client Gender Race            
#>   <chr>  <chr>  <chr>           
#> 1 A      M      White           
#> 2 B      F      African American

Created on 2019-09-05 by the reprex package (v0.3.0)

3 Likes

Thanks! That's exactly what I was trying to do great to know about the fill function. It didn't work for every single client so I'll have to figure out what's going on with the other rows that still have NA values.

Hmm, you might take a look a the ordering of the NAs in your data frame. The default of fill is to fill "down" (i.e. fill NAs with values from preceding rows). You could try fill(var, .direction = "downup") which will look both above and below for replacement values, if you know that each client only has one correct gender and race.

Here is an example where the NA in the race column is above the value you want to fill and so you need to specify the direction for it to fill appropriately.

library(tidyverse)

df <- tribble(
    ~Client, ~Gender, ~Race,
    "A", "M", "White",
    "A", NA , "White",
    "B", "F", NA,
    "B", "F", "African American"
  )

df %>% 
  group_by(Client) %>% 
  fill(Gender, Race) %>% 
  distinct()
#> # A tibble: 3 x 3
#> # Groups:   Client [2]
#>   Client Gender Race            
#>   <chr>  <chr>  <chr>           
#> 1 A      M      White           
#> 2 B      F      <NA>            
#> 3 B      F      African American

df %>% 
  group_by(Client) %>% 
  fill(Gender, Race, .direction = "downup") %>% 
  distinct()
#> # A tibble: 2 x 3
#> # Groups:   Client [2]
#>   Client Gender Race            
#>   <chr>  <chr>  <chr>           
#> 1 A      M      White           
#> 2 B      F      African American

Created on 2019-09-05 by the reprex package (v0.3.0)

4 Likes

I had to post the directions separately to get it to work like this:

df %>%
    group_by(Client) %>%
    fill(Gender, Race, .direction = 'up') %>%
    fill(Gender, Race) %>%
    distinct()

When I looked at the documentation for fill it didn't have 'downup' and it kept throwing an error. Thank you for all of your help!

2 Likes

Ah, looks like the the "downup" and "updown" options are in only in the dev version of tidyr:

Your solution works great with the current CRAN version -- if you want to use "downup" you can install the dev version from GitHub.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.