Names are great and helpful. When people have to assign a variable, they'll usually pick a good name. But when they don't have to assign one, like when parameters accept functions, there's a tendency to avoid naming. They'll use an "anonymous" function because "it's obvious and simple what's being done."
I've seen and, regrettably, written a lot of code that looks like this:
clean_data <- function(raw_data) {
numeric_cols <- vapply(
raw_data,
function(x) {
all(grepl('^-?\\d+(,\\d{3})*\\.\\d+$', x[!is.na(x)], perl = TRUE))
},
logical(1L)
)
integer_cols <- vapply(
raw_data,
function(x) {
all(grepl('^-?\\d+(,\\d{3})*$', x[!is.na(x)], perl = TRUE))
},
logical(1L)
)
fips_cols <- grepl('FIPS', colnames(raw_data), fixed = TRUE)
integer_cols <- integer_cols & !fips_cols
raw_data[numeric_cols] <- lapply(
raw_data[numeric_cols],
function(x) {
as.numeric(gsub(',', '', x, fixed = TRUE))
}
)
raw_data[integer_cols] <- lapply(
raw_data[integer_cols],
function(x) {
as.integer(gsub(',', '', x, fixed = TRUE))
}
)
return(raw_data)
}
This function just takes data frames which store everything as characters and does common-sense-conversion for the numbers. But it looks more complicated than that. And anyone reading the code needs to keep 5 environments in mind: one for the main function and one for each of the anonymous functions. These anonymous functions are only a single line; imagine how much worse it'd be if they were each 5 lines with their own assignments or deeper mapping functions!
But here's the same function without anonymous inner functions:
is_numeric_column <- function(x) {
all(grepl('^-?\\d+(,\\d{3})*\\.\\d+$', x[!is.na(x)], perl = TRUE))
}
is_integer_column <- function(x) {
all(grepl('^-?\\d+(,\\d{3})*$', x[!is.na(x)], perl = TRUE))
}
fix_numeric <- function(x) {
as.numeric(gsub(',', '', x, fixed = TRUE))
}
fix_integer <- function(x) {
as.integer(gsub(',', '', x, fixed = TRUE))
}
clean_data <- function(raw_data) {
numeric_cols <- vapply(raw_data, is_numeric_column, logical(1L))
integer_cols <- vapply(raw_data, is_integer_column, logical(1L))
fips_cols <- grepl('FIPS', colnames(raw_data), fixed = TRUE)
integer_cols <- integer_cols & !fips_cols
raw_data[numeric_cols] <- lapply(raw_data[numeric_cols], fix_numeric)
raw_data[integer_cols] <- lapply(raw_data[integer_cols], fix_integer)
return(raw_data)
}
Now it's understandable even without comments.