My usual project workflow is to read in a bunch of files as a list using
readr, then do as much cleaning as possible before saving a single minimal table for analysis. I am slowly getting the handle of
purrr functions for helping with this approach, but could do with specific advice about:
How to iterate through tibbles in a list using
selectand other dplyr functions (see example below)?
Any other hints/tips/suggestions to improve this workflow using tidy principles?
library(tidyverse) set.seed(4321) #Make up some data in four separate tibbles table1 <- tibble( id = 1:10, age = floor(runif(min=18, max=100, n=10)), sex = sample(c("Male", "Female"), 10, replace = TRUE), nonsense = sample(letters, 10) ) table2 <- tibble( id = 1:4, weight = c("50 kg", "45", "65kg", "67"), height = c("141", "133cm", NA, "177 cm") ) table3 <- tibble( id = 1:10, outcome = sample(c("Alive", "Dead"), 10, replace = TRUE) ) useless_table <- tibble( no_use = sample(LETTERS, 10) ) #Add the tables to a list list_tables <- list(table1, table2, table3, useless_table) names(list_tables) <- c("table1", "table2", "table3", "useless_table") #Keep only the tables that we are interested in kept_tables <- list_tables %>% keep(names(.) %in% c("table1", "table2", "table3")) #Iterate through tables, selecting only the variables we wish to keep keep_vars <- list("id", "age", "sex", "weight", "height", "outcome") names(keep_vars) <- keep_vars kept_tables <- map(kept_tables, ~ select(.x, one_of(names(keep_vars)))) #> Warning: Unknown columns: `weight`, `height`, `outcome` #> Warning: Unknown columns: `age`, `sex`, `outcome` #> Warning: Unknown columns: `age`, `sex`, `weight`, `height` #Now how to iterate through tables, mutating to tidy-up some variables? #For example, parse height and weight as numeric #Doesn't work... kept_tables <- kept_tables %>% map(.x, ~ mutate_at(.vars = vars(weight, height), .funs = funs(parse_number))) #> Error in as_mapper(.f, ...): object '.x' not found #As a final step, would reduce to a single table e.g. for modelling #This works out <- kept_tables %>% reduce(left_join, by=c("id"))
Created on 2018-12-06 by the reprex package (v0.2.1)