Tidycensus applied longitudinally


#1

I’ve read about several applications of the tidycensus package (e.g., https://juliasilge.com/blog/using-tidycensus/) but am wondering if anyone in the R community has applied this package longitudinally for census tracts and could offer a reference? Thanks!


#2

Hi @sheilasaia! I’m a regular user of tidycensus. Can you give some more information as to what you mean by longitudinal use?

It’s pretty easy to download estimates for census tracts, etc. in different years by specifying the year argument in get_acs(). I haven’t done exactly this before in my workflow, but you could use purrr::map() over a vector of years with get_acs() as the function within map().

Here’s a small example that will return a named list of tibbles with the state population for each year between 2012 and 2016. You then could combine or manipulate those lists depending on what you are trying to calculate or visualize.

library(tidyverse)
library(tidycensus)

# define vars, geography in get_acs function
get_state_pop <- function(year) {
  get_acs(
    geography = "state",
    variables = "B01003_001",
    geometry = FALSE,
    year = year,
    survey = "acs1"
  )
}

#create names vector of years
pop_years <- set_names(2012:2016, paste0(rep(2012:2016), "_pop"))

# map get_acs over named vector of years
state_pop_12_16 <- map(pop_years, get_state_pop)

state_pop_12_16[["2012_pop"]]
#> # A tibble: 52 x 5
#>    GEOID NAME                 variable   estimate   moe
#>    <chr> <chr>                <chr>         <dbl> <dbl>
#>  1 01    Alabama              B01003_001  4822023     0
#>  2 02    Alaska               B01003_001   731449     0
#>  3 04    Arizona              B01003_001  6553255     0
#>  4 05    Arkansas             B01003_001  2949131     0
#>  5 06    California           B01003_001 38041430     0
#>  6 08    Colorado             B01003_001  5187582     0
#>  7 09    Connecticut          B01003_001  3590347     0
#>  8 10    Delaware             B01003_001   917092     0
#>  9 11    District of Columbia B01003_001   632323     0
#> 10 12    Florida              B01003_001 19317568     0
#> # ... with 42 more rows
state_pop_12_16[["2013_pop"]]
#> # A tibble: 52 x 5
#>    GEOID NAME                 variable   estimate   moe
#>    <chr> <chr>                <chr>         <dbl> <dbl>
#>  1 01    Alabama              B01003_001  4833722     0
#>  2 02    Alaska               B01003_001   735132     0
#>  3 04    Arizona              B01003_001  6626624     0
#>  4 05    Arkansas             B01003_001  2959373     0
#>  5 06    California           B01003_001 38332521     0
#>  6 08    Colorado             B01003_001  5268367     0
#>  7 09    Connecticut          B01003_001  3596080     0
#>  8 10    Delaware             B01003_001   925749     0
#>  9 11    District of Columbia B01003_001   646449     0
#> 10 12    Florida              B01003_001 19552860     0
#> # ... with 42 more rows

Created on 2018-02-08 by the reprex package (v0.1.1.9000).

Hope this helps!


#3

@mfherman, thanks so much for your reply, for your code, and for the idea to use purrr::map() to access multiple estimates at once. I’ve been able to download single 5-year estimates using tidycensus but was curious how (or if?) others have used it to account for changing census tract boundaries when comparing 5-year estimates of a census variable across time. I know about the longitudinal database from Brown University but it would be cool (and easily reproducible) to generate something similar in R. I guess the 5-year estimates would only go back as far as when the ACS dataset started (so you’d just have to deal with pre and post 2010 boundaries) but wondering about applying this to decadal estimates as well.


#4

As far as I know, there isn’t a package for R that will do this. But I agree that it would be very cool to make one that could account for changed boundaries, etc. In fact, I’ve been working on a project this week and fighting with changing election district boundaries in New York City. So frustrating!!

In addition to the link you shared, here are some tract-level crosswalks you could use to compare census data from pre-1960 to 2010 tracts. You can also find some nice time-series tables as well as historical shapefiles at IPUMS.

I don’t have the technical capacity (yet!) or time to work on an R package to automate this, but please share your code if you do any work with this, I’m sure it will be useful for many people!

Finally, you probably know this already, but you should take some care when comparing certain (overlapping) ACS estimates.


#5

@mfherman, I totally understand! Thanks so much for all this helpful info. Those shapefiles will definitely come in handy. I also don’t have the technical bandwidth just yet but promise to share any progress that I make. I hope you’ll do the same! :slight_smile: