How can I find the difference in population by year and zip code using dplyr?

Hi, I am working with ACS data and I have used the dplyr package to filter my data to this. However, I am trying to use mutate within dplyr to find the increase and decrease in total population from 2013 to 2016 based on zip code. for example I want to create a new column that finds for zip 43001 what was the difference in total population from 2013 to 2016

My input:

filterdacs_D1 <- mutate(filename,difference1 =$zip$poulation[population]-$zip$population2017)
But this is incorrect as I want the difference between 2016 and 2014 for each zip code.

     Year                Zip      Total_Population Median_Income           City    State
1    2013              43001             2475         87333                  Alexandria    OH
2    2013              43002             2753         83873                  Amlin           OH
3    2014              43003             2366         46691                 Ashley         OH
4    2014              43001            24625         70809              Blacklick        OH
5    2014              43005              155         43810            Bladensburg    OH
6    2015              43006              705         45673             Brinkhaven    OH
7    2015              43001             2430         28422           Buckeye Lake    OH
8    2016              43009             2036         62188                  Cable    OH
9    2016              43010              386         34625                Catawba    OH
10   2016             43001             7733        66548             Centerburg    OH

How do I go about this?

It's hard to help you with your sample data in this format, could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

This was already answered on stack overflow

Please follow our policy about cross posting

if no one is answering on stack cant I rely on my Rcommunity? :frowning:

Yes, you can, you are just asked to make it clear and update the content both sides.
Please read the policy for more detail.

2 Likes

The answer from Big Ozzy and the responding comment from Andres at the Stack Overflow question are good.

Their approach is to:

  1. take your data frame (Big Ozzy provided an example one called zips, but since you already have data, you can skip that and replace with your data frame's name, as Andres suggests.)
  2. Filter to just include 2013 and 2016 numbers.
  3. Spread so the 2013 population goes to a column called 2013 and the 2016 population goes to a column called 2016.
  4. Big Ozzy's answer presumes there might be zips with fewer or more than one row of population for each year. (some might be new zips, or might accidentally have been included twice.) To deal with these, the answer sums all the 2013 and 2016 rows that might exist for that zip. In general, those should just be summing one number each.
  5. The last step subtracts the difference between the two columns.

If this is not working for you, or not what you were looking for, it would be helpful to edit your question to explain.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.