Can I get R to replace data that I cannot define?

I'm updating taxonomy for birds and have two unique coding variables that code data from two other datasets. Some species are missing this data - this isn't an issue, we can duplicate this data across species that are closely related. However, scrolling up and down my thousands of observations to copy and paste these codes manually is taking a hefty amount of time, so I'm looking for a shortcut.

Species Code Name ID CODE
Accipiter hiogaster Accipiter Novae
Accipiter novae 94 20

My actual table is much longer than this but say I want to given Accipiter hiogaster the ID and CODE from Accipiter novae.

Is there a way I can get R to search for matches between Code Name and Species, and copy ID and CODE into the observation with the missing values when it finds a match?

I don't know if this will cover all of your cases, but it works for the example.

suppressPackageStartupMessages(library(dplyr))
df <- data.frame(Species = c("Accipiter hiogaster", "Accipiter novae"),
                 CodeName = c("Accipiter novae", NA),
                 ID = c(NA, 94),
                 CODE = c(NA, 20), stringsAsFactors = FALSE)
df
#>               Species        CodeName ID CODE
#> 1 Accipiter hiogaster Accipiter novae NA   NA
#> 2     Accipiter novae            <NA> 94   20
JoinDF <- left_join(df, df, by = c("CodeName" = "Species")) 
JoinDF
#>               Species        CodeName ID.x CODE.x CodeName.y ID.y CODE.y
#> 1 Accipiter hiogaster Accipiter novae   NA     NA       <NA>   94     20
#> 2     Accipiter novae            <NA>   94     20       <NA>   NA     NA
AdjDF <- JoinDF %>% mutate(ID.x = ifelse(is.na(ID.x), ID.y, ID.x),
                           CODE.x = ifelse(is.na(CODE.x), CODE.y, CODE.x))
AdjDF
#>               Species        CodeName ID.x CODE.x CodeName.y ID.y CODE.y
#> 1 Accipiter hiogaster Accipiter novae   94     20       <NA>   94     20
#> 2     Accipiter novae            <NA>   94     20       <NA>   NA     NA
AdjDF <- AdjDF %>% select(Species, CodeName, ID = ID.x, CODE = CODE.x)
AdjDF
#>               Species        CodeName ID CODE
#> 1 Accipiter hiogaster Accipiter novae 94   20
#> 2     Accipiter novae            <NA> 94   20

Created on 2019-05-30 by the reprex package (v0.2.1)

1 Like

Thanks for that! I wanted to not have to enter the "Species" and "Name" to be able to complete this task, but your answer really helped!

I managed to make it work using the following code.

df_Copy is an exact replicate of the df.

df1 <- merge (df, df_Copy, by.x="Name", by.y="Species")

AdjDF <- df1 %>% mutate (ID.x = ifelse(is.na(ID.x), ID.y, ID.x), CODE.x = ifelse(is.na(CODE.x), CODE.y, CODE.x))

AdjDF <- AdjDF %>% select (Species, Name, ID=ID.x, CODE = CODE.x)

df2 <- merge(df, AdjDF, by="Species", all=TRUE)

AdjDF2 <- df2 %>% mutate (ID.x = ifelse(is.na(ID.x), ID.y, ID.x), Row.x = ifelse(is.na(CODE.x), CODE.y, CODE.x))

AdjDF2 <- AdjDF2 %>% select(Species, Name= Name.x, ID=ID.x, CODE = CODE.x)

This matches all of them at once; important as I have 1200 observations to do this to.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.