@cderv This is the code I hope to make into a function to loop through for every year of data. I have 18 years (2001-2018) in individual files. one folder for frpm and one folder for enrollment, each with their respective 18 files. The data share a column but it have different names. So one line of the code renames that column so they are the same. The next line joins the two dataframes. The third line, which has not yet been added, is to write to csv.
##merge enrollment with frpm for each year (year column droppped from frpm df- year available in enrollment df)
enrollment2001 <- read.csv("C:\\Box Sync\\Karina Fastovsky\\Data\\Schools\\enrollment\\cde_enrollment\\Batch579\\enrollment_2000-2001.csv", colClasses = c("CDS_CODE" = "character"))
frpm2001 <- read.csv("C:\\Box Sync\\Karina Fastovsky\\Data\\Schools\\frpm\\frpm_2000-2018_clean1\\frpm2001_clean1.csv", colClasses = c("CDSCode" = "character", "year"= "NULL"))
#rename key column to be a shared name CDS_CODE
names(frpm2001)[names(frpm2001) == "CDSCode"] <- "CDS_CODE"
#merge enrollment with frpm
Merged01 <- merge(enrollment2001, frpm2001, by= "CDS_CODE", all.x= TRUE)
the - has no effect on the error. it was added while looking at other examples of code where people mark --- for columns they are skipping over as in c(1--4-6). Don't know if it works but decided to try it. The error happens with or without it.
Previous error: when using quotes on "CDS_CODE"
####create a function to combine enrollment and frpm data
enrol_frpm<-function(year){
+
+ #read in the data
+ inputPath1 <- "C:\\Box Sync\\Karina Fastovsky\\Data\\Schools\\enrollment\\cde_enrollment\\Batch579\\"
+ inputPath2 <- "C:\\Box Sync\\Karina Fastovsky\\Data\\Schools\\frpm\\frpm_2000-2018_clean1\\"
+ inputPath3 <- "C:\\Box Sync\\Karina Fastovsky\\Data\\Schools\\enrol_frpm\\"
+
+ enrol <- read.table(paste(inputPath1,"enrollment_",year-1,"-",year,".csv",sep=''),sep = '', fill = TRUE,header = TRUE, quote = "", colClasses = c("CDS_CODE" = "character"))
+ frpm <- read.csv(paste(inputPath2,"frpm",year, "_clean1", ".csv",sep = ''),sep = '', fill = TRUE,header = TRUE, quote = "", colClasses = c("CDSCode" = "character"))
+
+ #drop year column in frpm
+ frpm$year= NULL
+ #rename key column to be a shared name CDS_CODE
+ names(frpm) <- c("CDS_CODE")
+ #merge enrollment with frpm
+ Merged<- merge(enrol, frpm, by= "CDS_CODE", all.x= TRUE)
+
+ #write out the data
+ write.csv(Merged, paste0(inputPath3,"enrol_frpm",year,".csv",sep=''), row.names = FALSE)
+
+ }
#define the years of interest
years <- c(2001:2018)
#run the previously defined function for each year
for (year in years) {
+ enrol_frpm(year)
+ }
Show Traceback
Rerun with Debug
Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
#> Error: <text>:22:3: unexpected '}'
#> 21: +
#> 22: + }
#> ^
Thank you for your time and help.