I'm importing a file from an Open data portal that has Notes at the end embedded in the file.
I know I can use a loop and delete every row after the Notes line but I was wondering if there was a more elegant solution to reading in the file to get it to stop reading at the first blank row? Or when it encounters the Notes.
In this particular case the Notes line is on line 93 and anything after should be deleted from the data set.
#Table 1 - Approved Canada Emergency Wage Subsidy (CEWS) claims by period and province/territory of business address
cews_prov_src = "https://www.canada.ca/content/dam/cra-arc/serv-info/tax/business/topics/cews/statistics/cews_tbl1.csv"
cews_prov_raw = read_csv(url(cews_prov_src), col_names = c("Claim_Period",
"Province",
"Applications_Approved_YTD",
"Number_Eligible_Employees",
"Number_Eligible_Leave_with_Pay",
"Number_Employees_Supported",
"CEWS_Approved_YTD",
"Average_CEWS_Per_Employee",
"Percent_Approved_Applications_Period"),
skip=2)
I wrote a little function (drop_after_empty_line()), its not terribly elegant, but it works so long as you are certain there is a blank line at the end of each CSV. Feel free to change as needed. It uses a regex to find a line of just commas, signifying an empty line, and then returns the data with that line and all lines after it removed. You can pass through skip or col_names