I have a package under development that includes an initial workflow for importing up to 12 .csv
files at a time using purrr::map()
, validating each of them, and then creating a tibble of the validation results. The number of .csv
files is not predictable except that it is 2 <= files <= 12
.
I've created a reprex below that implements a very simple version of this process (while also creating some sample data). The workflow itself is rather complex, but I've tried to distill it down here as best I can.
The reprex:
- Creates two sample data frames named
a
andb
- Writes both to a temporary
.csv
file - Imports them both back into the session using
map()
to simulate the actual workflow - Names the first list item (
a
)red
and the second list item (b
)blue
. - Creates a simple validation function.
- Uses
map()
to apply the validation function to botha
andb
. - Prints the validation results.
Herein lies the challenge - I want to take the name of each list item (i.e. red
and blue
) and add them as observations in the validation results. I have the process down as a for
loop, which is the last step in the reprex before I print the type of output I ultimately want to create. I cannot for the life of me figure out how to do this final step (of writing list names in as observations) with purrr
as opposed to with the loop. Any suggestions would be greatly appreciated!
# load packages
suppressMessages(library(dplyr))
library(purrr)
library(readr)
# create data
a <- data.frame(
id = c(1, 2, 3, 4, 5),
group = c("red", "red", "red", "red", "red"),
outcome = c(TRUE, FALSE, FALSE, TRUE, FALSE),
stringsAsFactors = FALSE
)
b <- data.frame(
id = c(1, 2, 3, 4, 5),
group = c("blue", "blue", "blue", "blue", "blue"),
outcome = c(FALSE, TRUE, FALSE, TRUE, TRUE),
stringsAsFactors = FALSE
)
# save as csv to tempdir
a_file <- tempfile(pattern = "", fileext = ".csv")
write_csv(a, path = a_file)
b_file <- tempfile(pattern = "", fileext = ".csv")
write_csv(b, path = b_file)
# create list of files
files <- dir(path = tempdir(), pattern = "*.csv")
# combine list of files into single list using map()
files %>%
map(~ suppressMessages(suppressWarnings(read_csv(file.path(tempdir(), .))))) -> data
# name the two items in data
names(data) <- c("red", "blue")
# validation function
validate <- function(item){
# logic check 1 - does it have 3 cols?
if (ncol(item) == 3){
a <- TRUE
} else {
a <- FALSE
}
# logic check 2 - is it a tibble?
classes <- class(item)
if (classes[1] == "tbl_df"){
b <- TRUE
} else {
b <- FALSE
}
# concatenate results
out <- c(a,b)
# return results
return(out)
}
# validate items by iterating over list
data %>%
purrr::map(validate) -> result
# print results
result
#> $red
#> [1] TRUE TRUE
#>
#> $blue
#> [1] TRUE TRUE
# add name as observation
for (i in 1:length(result)){
result[[i]] <- c(result[[i]], names(result[i]))
}
# print results again
result
#> $red
#> [1] "TRUE" "TRUE" "red"
#>
#> $blue
#> [1] "TRUE" "TRUE" "blue"
Created on 2018-10-07 by the reprex
package (v0.2.0).