I am brand new to R/RStudio and am stumbling over how to deal with NA values.
Apologies in advance if post is not properly formatted, or if I've made other newbie mistakes.
Thanks for any suggestions (simplified explanations much appreciated!):
Here's a simplified example of the problem:
##########################################
# Test data
##########################################
df = data.frame(
DT1 = c(NA, NA, NA, "2020-01-01 0900", "2020-01-02 0915", "2020-01-03 0930"),
DT2 = c("2020-01-01 0900", "2020-01-01 0900", "2020-02-01 1000", "2020-01-01 1000", "2020-01-02 1100", "2020-01-03 1200"),
stringsAsFactors = F
)
##########################################
# Convert to POSIXct
##########################################
df$DT1 <- ymd_hm(df$DT1)
df$DT2 <- ymd_hm(df$DT2)
df
##########################################
# Try to use difftime() to calculate elapsed time
##########################################
for(i in 1:nrow(df)){
if(!is.na(df$DT1[i])) {df$TimeElapsedDays[i] <- difftime(df$DT2[i], df$DT1[i], units = c("days"))}
}
df before last command looks ok:
DT1 DT2
1 <NA> 2020-01-01 09:00:00
2 <NA> 2020-01-01 09:00:00
3 <NA> 2020-02-01 10:00:00
4 2020-01-01 09:00:00 2020-01-01 10:00:00
5 2020-01-02 09:15:00 2020-01-02 11:00:00
6 2020-01-03 09:30:00 2020-01-03 12:00:00
Here's the error:
Error in `$<-.data.frame`(`*tmp*`, "TimeElapsedDays", value = c(NA, NA, :
replacement has 4 rows, data has 6