Finding and replacing values in a dataframe

Can someone help with the following please?

In the code below, I want to do the following:

Filter on ID 3 and then replace the NA value in the 'Code' column with a value, lets say in this case "N3". And also filter on ID 4 and replace NA in 'Code' column with "N4" - How do I do that please?

df <- tibble(
  ID = c(1, 2, 3, 4, 5),
  Area = c("North", "West", "South", "East", "West"),
  Code = c("N1,", "N2", NA, NA, "N5")
)

There are two functions that might help you do the task as you've described it here: dplyr filter(), and tidyr replace_na()

library(tidyverse)
df <- tibble(
  ID = c(1, 2, 3, 4, 5),
  Area = c("North", "West", "South", "East", "West"),
  Code = c("N1", "N2", NA, NA, "N5")
)

df3 <- df %>%
  filter(ID == 3)

df3
#> # A tibble: 1 x 3
#>      ID Area  Code 
#>   <dbl> <chr> <chr>
#> 1     3 South <NA>

df3$Code %>% replace_na("N3")
#> [1] "N3"

Created on 2019-01-02 by the reprex package (v0.2.1.9000)

Depending on your dataset, you might consider using an approach that uses case_when() or something of the like.

3 Likes

Thanks Mara, I had a look at 'case_when' but got very confused and didn't have a clue. Could you please show how I could use it on this piece of code? Thank you in advance :grinning:

Honestly, I would've done this just using a mutate and pasting the letter N in front of the ID number:

library(tidyverse)
df <- tibble(
  ID = c(1, 2, 3, 4, 5),
  Area = c("North", "West", "South", "East", "West"),
  Code = c("N1", "N2", NA, NA, "N5")
)

df %>%
  mutate(Code = glue::glue("N{ID}"))
#> # A tibble: 5 x 3
#>      ID Area  Code      
#>   <dbl> <chr> <S3: glue>
#> 1     1 North N1        
#> 2     2 West  N2        
#> 3     3 South N3        
#> 4     4 East  N4        
#> 5     5 West  N5

Created on 2019-01-02 by the reprex package (v0.2.1.9000)

There's a nice write-up on using case_when() at the link below:

Hi Mara, so the code I pasted was an example - in reality I have a large dataset. I have to locate certain numbers in the ID column and then change the NA value in the code column to a specific value. Is there a generic method? I've had a look at the case-when notes but I don't understand how I could apply that to the dataset I have. Grateful for any advice.

Here are two more approaches, using case_when()

library(tidyverse)

df <- tibble(
  ID = c(1, 2, 3, 4, 5),
  Area = c("North", "West", "South", "East", "West"),
  Code = c("N1", "N2", NA, NA, "N5")
)

df %>%
  mutate(Code = case_when(
    ID == "3" & is.na(Code) ~ "N3",
    ID == "4" & is.na(Code) ~ "N4",
    TRUE ~ Code
  ))
#> # A tibble: 5 x 3
#>      ID Area  Code 
#>   <dbl> <chr> <chr>
#> 1     1 North N1   
#> 2     2 West  N2   
#> 3     3 South N3   
#> 4     4 East  N4   
#> 5     5 West  N5

df %>%
  mutate(Code = case_when(
    is.na(Code) ~ glue::glue("N{ID}"),
    TRUE ~ Code
  ))
#> # A tibble: 5 x 3
#>      ID Area  Code      
#>   <dbl> <chr> <S3: glue>
#> 1     1 North N1        
#> 2     2 West  N2        
#> 3     3 South N3        
#> 4     4 East  N4        
#> 5     5 West  N5

Created on 2019-01-02 by the reprex package (v0.2.1.9000)

4 Likes

That's brilliant, exactly what I wanted - thank you Mara!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.