I'm trying to transform an excel file and was "handed down" an Rscript to transform specific files. I am very new to R so apologies in my lack of understanding.
this is my code:
# libraries
library(dplyr)
library(tidyr)
library(readxl)
library(openxlsx) # Load on each new session
# Use this for leavers survey
# change the file path "<path>/NEW - leavers data 2019 2020.xlsx"
inputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx"
outputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST Reshaped.xlsx"
# C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx
df = read_excel(inputFilepath, sheet="Sheet1")
# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
group_by("Employee ID", "Question Body") %>%
filter(row_number() == n()) %>% # keep last row
ungroup()
# Step 2: Pivot Data
dfTransformed = dfDuplicatesRemoved %>%
select("Employee ID", "Question Body", "Questionnaire Answer") %>%
pivot_wider(id_cols="Employee ID", names_from="Question Body", values_from="Questionnaire Answer")
# Step 3: Obtain an Employee Profile by fetching all data except for question related data.
dfEmployeeProfile = df %>%
dplyr::select(-c("Question Body", "Questionnaire Answer", "Question", "Questionnaire")) %>%
group_by("Employee ID") %>%
filter(row_number() == n()) %>% # keep last row
ungroup()
# Step 4: Join transformed data with Employee Profile to have a complete view
dfTranformedWithEmployeeProfile = dfTransformed %>% right_join(dfEmployeeProfile)
write_xlsx(dfTranformedWithEmployeeProfile, outputFilepath)
rm(list = ls())
when I ran it initially, I ran into an error due to backticks, where, in "Step 1" columns "Employee ID" and "Question Body" would not be detected. I fixed this by replacing the backticks with apostrophes, as, in the earlier code, the line was written as:
group_by(`Employee ID`, `Question Body`) %>%
Then I ran into this error:
> # Step 2: Pivot Data
> > dfTransformed = dfDuplicatesRemoved %>%
> + select("Employee ID", "Question Body", "Questionnaire Answer") %>%
> + pivot_wide .... [TRUNCATED]
> Error in `select()`:
> ! Can't subset columns that don't exist.
> x Column `Employee ID` doesn't exist.
But this time I replaced all backticks with apostrophes. so what could be causing this error now?