Hmm John you make a good point.
The trick is that zips with the full 9 are ok, since I can just substring characters 1-5. Zips that ONLY have 5 are fine since I don't need to do anything.
However, if the zip is 3 or 4 digits, it would then be something like 000001234. When I substring off the first 5 chars, I get 00000 as opposed to 01234. Right now using your first code snippet, it updated instantly(650k rows mind you) however I have 182 entries that now pushed the real zip to the back of the string.
This is why I was looking at the if else logic since I can say if len == 4, pad 1 zero, etc. However the for loop was quite slow.
Using the second piece of code seems to correct that problem however, if my zip records are missing the preceding 0 and over the length of 4 it stays off. Using your second snippet, could I do something like:
fix_zips <- function(col) {
col <- trimws(as.character(col), which = "both")
for(len in 0:4) {
needs_fix <- nchar(col)==len
pad <- paste(rep("0", 5-len), collapse = "")
col[needs_fix] <- paste0(pad, col[needs_fix])
}
for(len in 0:6) {
needs_fix <- nchar(col)==len
pad <-paste(rep("000", 9-len), collapse = "")
col <- paste0(col, paste(rep("0", 9), collapse = ''))
substr(col, 1, 9)
}
I think I would need to do this for 7,8 as well. The goal would be to "end up" with either 5 OR 9 in length.