Code read with backticks even though I replaced it with apostrophes so I get a "column does not exist error"

I'm trying to transform an excel file and was "handed down" an Rscript to transform specific files. I am very new to R so apologies in my lack of understanding.
this is my code:

# libraries
library(dplyr)
library(tidyr)
library(readxl)
library(openxlsx) # Load on each new session

# Use this for leavers survey

# change the file path "<path>/NEW - leavers data 2019 2020.xlsx"
inputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx"
outputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST Reshaped.xlsx"

# C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx
df = read_excel(inputFilepath, sheet="Sheet1")

# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
  group_by("Employee ID", "Question Body") %>%
  filter(row_number() == n()) %>% # keep last row
  ungroup()

# Step 2: Pivot Data
dfTransformed = dfDuplicatesRemoved %>%
  select("Employee ID", "Question Body", "Questionnaire Answer") %>%
  pivot_wider(id_cols="Employee ID", names_from="Question Body", values_from="Questionnaire Answer") 

# Step 3: Obtain an Employee Profile by fetching all data except for question related data.
dfEmployeeProfile = df %>%
  dplyr::select(-c("Question Body", "Questionnaire Answer", "Question", "Questionnaire")) %>% 
  group_by("Employee ID") %>%
  filter(row_number() == n()) %>% # keep last row
  ungroup()

# Step 4: Join transformed data with Employee Profile to have a complete view
dfTranformedWithEmployeeProfile = dfTransformed %>% right_join(dfEmployeeProfile)

write_xlsx(dfTranformedWithEmployeeProfile, outputFilepath)
rm(list = ls())

when I ran it initially, I ran into an error due to backticks, where, in "Step 1" columns "Employee ID" and "Question Body" would not be detected. I fixed this by replacing the backticks with apostrophes, as, in the earlier code, the line was written as:

 group_by(`Employee ID`, `Question Body`) %>%

Then I ran into this error:

> # Step 2: Pivot Data
> > dfTransformed = dfDuplicatesRemoved %>%
> +   select("Employee ID", "Question Body", "Questionnaire Answer") %>%
> +   pivot_wide .... [TRUNCATED] 
> Error in `select()`:
> ! Can't subset columns that don't exist.
> x Column `Employee ID` doesn't exist.

But this time I replaced all backticks with apostrophes. so what could be causing this error now?

This was a mistake and sent you down the wrong track.
You could use feature of library(rlang) to interpret the strings as symbols for group_by, or use group_by_at that expects to get strings.

#normal
iris %>% group_by(Species) %>% summarise(m_ = mean(Petal.Length))
#fine also
iris %>% group_by(`Species`) %>% summarise(m_ = mean(Petal.Length))
# mis-step , not as expected
iris %>% group_by("Species") %>% summarise(m_ = mean(Petal.Length))

# option 1
iris %>% group_by(!!sym("Species")) %>% summarise(m_ = mean(Petal.Length))
# option 2
iris %>% group_by_at("Species") %>% summarise(m_ = mean(Petal.Length))

I have included rlang and used group_by_at and replaced apostrophes with backticks and I'm back with the same problem but with Step 1:

The code is now:

# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
  group_by_at(`Employee ID`, `Question Body`) %>%
  filter(row_number() == n()) %>% # keep last row
  ungroup()

and the output error is:

 # Step 1: Remove Duplicates
> dfDuplicatesRemoved = df %>%
+   group_by_at(`Employee ID`, `Question Body`) %>%
+   filter(row_number() == n()) %>% # .... [TRUNCATED] 
Error in tbl_at_vars(tbl, vars, .include_group_vars = .include_group_vars,  : 
  object 'Employee ID' not found

You've chosen to try something that I did not demonstrate to you....
Was there a reason you didn't choose to try one of the options that I showed as working ?

I misunderstood your comment.
I have re-attempted your optional methods, neither seem to work.

# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
  group_by_at("Employee ID", "Question Body") %>%
  filter(row_number() == n()) %>% # keep last row
  ungroup()

which gives me this error:

> # Step 1: Remove Duplicates
> dfDuplicatesRemoved = df %>%
+   group_by_at("Employee ID", "Question Body") %>%
+   filter(row_number() == n()) %>% # .... [TRUNCATED] 
Error in `group_by_at()`:
! Can't subset columns that don't exist.
x Column `Employee ID` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.

and I also tried the other option:

# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
  group_by(!!sym("Employee ID", "Question Body")) %>%
  filter(row_number() == n()) %>% # keep last row
  ungroup()

which gives me this error:

> # Step 1: Remove Duplicates
> dfDuplicatesRemoved = df %>%
+   group_by(!!sym("Employee ID", "Question Body")) %>%
+   filter(row_number() == n()) % .... [TRUNCATED] 
Error in sym("Employee ID", "Question Body") : 
  unused argument ("Question Body")

This is an aside but when using !!sym() you process one at a time so it would be
group_by(!!sym("Employee ID"), !!sym("Question Body"))

however, I expect this would give the same error as your other code with group_by_at.
The interpretation from the error message is clear. Despite your belief that there is a variable "Employee ID" in the df, there is no such variable.
I recommend you use R to report to you the columnames of 'df' so you can base your manipulations of it from that , sorting them into alphabetical order, might help here.

sort(names(df))

is Employee ID an entry in that output ?

Strange, I don't. I get this:

> sort(names(df))
 [1] "...10"                    "...11"                    "...12"                    "...13"                   
 [5] "...14"                    "...15"                    "...16"                    "...17"                   
 [9] "...18"                    "...19"                    "...2"                     "...20"                   
[13] "...21"                    "...22"                    "...23"                    "...3"                    
[17] "...4"                     "...5"                     "...6"                     "...7"                    
[21] "...8"                     "...9"                     "Exit Interview Responses"

if it helps, this is the excel file I am working with which I use this script to transform, I was told all it needed was changing the filepathname.

Exit Interview Responses
Questionnaire Target Contexts
Questionnaire Targets
Supervisory Organization
Include Subordinate Organizations Yes
Created On or After 01/06/2022
Created On or Before 30/06/2022
Questionnaire Answers
Effective Date Business Process Reason Employee ID Voluntary or Involuntary Manager Supervisory Organization Management Level Location Country FTE % Service in Years Worker Type Supervisory Org - 1 From The Top Supervisory Org - 2 From The Top Supervisory Org - 3 From The Top Supervisory Org - 4 From The Top Supervisory Org - 5 From The Top Supervisory Org - 6 From The Top Questionnaire Respondent Question Question Body Questionnaire Answer

the columns below would contain information about the surveyees which I did not include

Its clear your excel has preliminary rows that arent intented to be the table to read in and make df out of.
you will have to count the records to skip and use the skip parameter of read_excel to make a table having skipped the initial rows.
Read xls and xlsx files — read_excel • readxl (tidyverse.org)

It worked! thank you, it was an issue with the excel file itself. All that's left is to replace the old code which writes the transformed data frame into excel, I'm guessing write_xlsx is not a used function anymore.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.