Hello,
I am quite new with R and trying to learn as much as I can about data manipulation.
I feel what I am trying to accomplish should be easy, in principle, but whatever approach I try I can’t find a solution. So here I am with a few questions. Thanks for your help.
My main dataset looks something like this, where I have dates, regions, and a series of variables (x-y-z-t-d-e-f). There are many more dates so it’s much longer than this.
Date Region X Y Z T D E F
01-01-2020 RegionA 2 4 2 3 2 3 4
01-01-2020 RegionB 1 3 2 2 3 3 3
01-01-2020 RegionC 1 4 4 2 3 4 2
01-01-2020 RegionD 2 4 2 3 2 4 4
01-01-2020 RegionE 1 3 2 2 2 2 2
02-01-2020 RegionA 2 4 7 3 2 3 4
02-01-2020 RegionB 1 3 2 2 2 3 3
02-01-2020 RegionC 1 4 4 8 3 4 2
02-01-2020 RegionD 2 3 2 3 2 4 4
02-01-2020 RegionE 1 3 2 2 2 2 2
Then I have second dataset, which contains further information about the population of these regions
Region Pop
RegionA 2000
RegionB 4039
RegionC 24728
RegionD 3738
RegionE 2936
There are two tasks I want to accomplish. One, related to the first dataset, would be to add together two rows. For example, creating a RegionAB whose variables (x-y-z-t-d-e-f) are the sum of RegionA and RegionB. This should be done in each date, separately. So the final dataset would have a RegionAB row in 01-01-2020 and in 02-01-2020
The second task is to divide the values of one of the variables (say Z) by the values of the population contained in the other dataset. This should be done for all dates separately and added in a new column.
My third question is, what type of book do I need to learn this kind of data manipulation?
Thank you