Every R
problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.
For this problem, we have an object, x, a data frame, and we want another data frame, y, that contains only a subset of columns and rows. The problem specifies the components of f—select
, filter
%in%
and assign
.
Thus, we have a(b(c(d(x)))) = f(x), and it's just a matter of the correct order. We can introduce another function, %>%
the pipe operator, to avoid the plague of parentheses:
x %>% a(.) %>% b(.) %>% c(.) %>% d()
d
is simply the assignment operator's right side version, ->
.
This leaves the remaining functions, and it will be convenient to set up two subsidiary objects use by with %in%
NE <- c("Connecticut","Maine","Massachusetts","New Hampshire","Rhode Island","Vermont")
years <- 1996:2019
There remains only to choose the appropriate functions for a
and b
from among those identified in the question and decide the order.