New to R - need tocreate simple table from 3 separate excel workbooks

Hello,

I have 3 excel workbooks all containing population data for 3 different years. Each datasheet has location codes as the first column and then population data in the subsequent columns. I want to look at each data sheet, define a specific code and create a table for the total population for each year. Here is an example of the datasheet:

All 3 datasheets look the same. I have defined them as this:
pop_2016 <- read_excel("Wigan LSOA - Mid year 2016.xlsx", sheet="Persons")
pop_2017 <- read_excel("Wigan LSOA - Mid year 2017.xlsx", sheet="Persons")
pop_2018 <- read_excel("Wigan LSOA - Mid year 2018.xlsx", sheet="Persons")

So for each of pop_2016, pop_2017, pop_2018 I want to look in the column "2011 super" at a specific code (E01006283) and the total population associated with it in the "All ages" column".

This is my first time writing a script so I'm a total beginner.

Try..

library(tidyverse)

rbind(pop_2016, pop_2017, pop_2018) %>%
    select(pop_2011_super = 1, All_ages = 3) %>%
    filter(pop_2011_super == "E01006283")

Hi,

The full column name (should have said before) is actually '2011 super output area - lower layer' when expanded out, so I've changed it to say that:

rbind(pop_2016, pop_2017, pop_2018) %>%
select(2011_super_output_area_-lower_layer = 1, All_ages = 3) %>%
filter(2011_super_output_area
-_lower_layer == E01006283")

However, it's not working. I'm getting this error:

rbind(pop_2016, pop_2017, pop_2018) %>%

  • select(2011_super_output_area_-lower_layer = 1, All_ages = 3) %>%
    Error: unexpected input in:
    "rbind(pop_2016, pop_2017, pop_2018) %>%
    select(2011
    "

filter(2011_super_output_area_-lower_layer == E01006283")
Error: unexpected input in " filter(2011
"

1/
Ok, the second error, you are missing quote (") in front of E01006283" - it should be "E01006283"

2/
First error is saying it doesn't like the var name you gave it. What do you get when you do
rbind(pop_2016, pop_2017, pop_2018) %>% names()

Hi,

So I have put in this as suggested:
rbind(pop_2016, pop_2017, pop_2018) %>% names()
select(2011_super_output_area_-lower_layer = 1, All_ages = 3) %>%
filter(2011_super_output_area
-_lower_layer == "E01006283")

I'm now getting these errors:

rbind(pop_2016, pop_2017, pop_2018) %>% names()
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
select(2011_super_output_area_-lower_layer = 1, All_ages = 3) %>%
Error: unexpected input in " select(2011
"
filter(2011_super_output_area_-lower_layer == "E01006283")
Error: unexpected input in " filter(2011
"

Try this..updated the orig response as well.

library(tidyverse)

rbind(pop_2016, pop_2017, pop_2018) %>%
    select(pop_2011_super_output_area_lower_layer  = 1, All_ages = 3) %>%
    filter(pop_2011_super_output_area_lower_layer  == "E01006283")

I understand that you are reading in data which will have column/variable names whatever they might be.
For your own convenience though, wouldn't you prefer to use your code, to rename the variables on their way in, so when it comes to writing your code and doing your analysis, you have an easier time of things ? rename() is a function available in the tidyverse and dplyr packages, also as has been mentioned names() can be used to both get variables names and set them.

something to consider.

Yes I will try that. Is that because R doesn't really like variable names with spaces? Or is it purely that it will be easier for me? Or both?

R does support variables with spaces, one would use the bactick symbol as a way to quote the start and end of such a name, but practically speaking it adds an extra layer of awkward. Its a case of everything being possible but somethings are more practical than others.
Often in my own code, I will use simple short descriptive names maybe with underscores to seperate.
Lots of output functions like charting ones, will let you define a name or label seperately to providing the variable to make the chart element out of, so that could be the time to reintroduce the long/pretty/name with spaces. In some cases though, I'd switch to the rename with ` backticks approach. It depends what you are dealing with.

Hi, I've tried that and it's bringing up the following error:
rbind(pop_2016, pop_2017, pop_2018) %>%

  • select(2011_super_output_area_-lower_layer = 1, All_ages = 3) %>%
    Error: unexpected input in:
    "rbind(pop_2016, pop_2017, pop_2018) %>%
    select(2011_"

filter(2011_super_output_area_-lower_layer == "E01006283")
Error: unexpected input in " filter(2011_"

Thank you both for your help but I am going to have to come back to this another time as it is the end of the working day for me!

Cut and paste this. It should work.

rbind(pop_2016, pop_2017, pop_2018) %>%
    select(pop_2011_super_output_area_lower_layer  = 1, All_ages = 3) %>%
    filter(pop_2011_super_output_area_lower_layer  == "E01006283")

nirgrahamuk Is explaining best general practice, which can save you in the long run. Good luck.

Also note that bind_rows() is preferable to rbind() here.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.