Hi all,
I am reaching out for tips on data management in tidyverse:
I have three data sets. Datasets 2a and 2b comprise each a randomized half of dataset 1. Dataset 1 contains an extra variable which I want to add to the correct rows in datasets 2a and 2b.
The order of the rows are different between the datasets, and must remain so. Does anyone have a tip about how that can be done? It is a data set with 4000 observations.
library(tidyverse)
library(knitr)
dataset1 <- tibble(name = c('Jane', 'Joe', 'Janet', 'George'), surname = c('Doe', 'Doe', 'Doh', 'Costanza'), phone = c(55512, 55513, 55514, NA))
dataset2a <- tibble(name = c('George', 'Janet'), surname = c('Costanza', 'Doh'))
dataset2b <- tibble(name = c('Joe', 'Jane'), surname = c('Doe', 'Doe'))
I am thinking there must be a way to identify identical rows across the datasets, but am not sure how to go about executing the operation. (I do have more variables than names and surnames, so all rows are uniquely identifiable within each dataset).
I hope that was clear enough, and appreciate any advice!