I am new with R and I am struggling with the following issue:
I have a dataset more or less like this:
NAME Collegue1 Collegue 2
John Smith Bill Gates Brad Pitt
Adam Sandler Mary Lynn John Smith
Bill Gates NA Adam Sandler
Brad Pitt John Smith Bob Morriss
I need to create an ID code and substitute names with the corresponding ID in the three columns, how can I do that? The result should be more or less like the following:
ID. NAME. Collegue1. Collegue2
John Smith 3 4
Adam Sandler 5 1
Bill Gates NA 2
Brad Pitt 1 6
Any idea? Thanks
I would do it like this.
library(dplyr, warn.conflicts = FALSE)
library(tibble)
data <- tribble(~ Name, ~ Colleague1, ~ Colleague2,
"John Smith", "Bill Gates", "Brad Pitt",
"Adam Sandler", "Mary Lynn", "John Smith",
"Bill Gates", NA, "Adam Sandler",
"Brad Pitt", "John Smith", "Bob Morris")
ids <- data %>%
select(Name) %>%
rownames_to_column()
data %>%
left_join(ids, by = c("Colleague1" = "Name")) %>%
left_join(ids, by = c("Colleague2" = "Name")) %>%
select(Name, Colleague1_ID = rowname.x, Colleague2_ID = rowname.y)
#> # A tibble: 4 x 3
#> Name Colleague1_ID Colleague2_ID
#> <chr> <chr> <chr>
#> 1 John Smith 3 4
#> 2 Adam Sandler <NA> 1
#> 3 Bill Gates <NA> 2
#> 4 Brad Pitt 1 <NA>
Created on 2020-04-29 by the reprex package (v0.3.0)
library(tidyverse)
df <- tibble::tribble(
~NAME, ~Collegue1, ~Collegue2,
"John Smith" , "Bill Gates", "Brad Pitt",
"Adam Sandler" , "Mary Lynn" , "John Smith",
"Bill Gates" , NA , "Adam Sandler",
"Brad Pitt" , "John Smith", "Bob Morriss"
)
all_names <- sort(unique(c(df$NAME,df$Collegue1,df$Collegue2)))
df$Collegue1 <- factor(df$Collegue1,levels=all_names)
df$Collegue2 <- factor(df$Collegue2,levels=all_names)
df2<- df %>% mutate_at(.vars = c("Collegue1","Collegue2"),
.funs = as.integer)
thanks. What if I have a dataset with 200+ rows and 100+ columns, so tibble:tribble is not the best for me. What would you do?
You asked how to transform your data.
This required me to generate your example data. The tribble is an example of how to communicate example data on the forum.
Here is a fully generic solution, again your own data would be in place of df.
library(tidyverse)
library(purrr)
df <- tibble::tribble(
~NAME, ~Collegue1, ~Collegue2,
"John Smith" , "Bill Gates", "Brad Pitt",
"Adam Sandler" , "Mary Lynn" , "John Smith",
"Bill Gates" , NA , "Adam Sandler",
"Brad Pitt" , "John Smith", "Bob Morriss"
)
contents <- map(names(df)[-1],~df[[.]])
all_names<- sort(unique(reduce(contents,c)))
df2<- df %>% mutate_at(.vars = names(df)[-1],
.funs = ~ as.integer(factor(.,levels=all_names)))
Superuseful, thanks a lot!
system
Closed
May 21, 2020, 10:38am
7
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.