I am brand new to R/RStudio and am stumbling over how to deal with NA values.
Apologies in advance if post is not properly formatted, or if I've made other newbie mistakes.
Thanks for any suggestions (simplified explanations much appreciated!):
Here's a simplified example of the problem:
##########################################
# Test data
##########################################
df = data.frame(
DT1 = c(NA, NA, NA, "2020-01-01 0900", "2020-01-02 0915", "2020-01-03 0930"),
DT2 = c("2020-01-01 0900", "2020-01-01 0900", "2020-02-01 1000", "2020-01-01 1000", "2020-01-02 1100", "2020-01-03 1200"),
stringsAsFactors = F
)
##########################################
# Convert to POSIXct
##########################################
df$DT1 <- ymd_hm(df$DT1)
df$DT2 <- ymd_hm(df$DT2)
df
##########################################
# Try to use difftime() to calculate elapsed time
##########################################
for(i in 1:nrow(df)){
if(!is.na(df$DT1[i])) {df$TimeElapsedDays[i] <- difftime(df$DT2[i], df$DT1[i], units = c("days"))}
}
Your problem is caused by trying to construct TimeElapsedDays one element at a time.
initialising it empty first would work
df$TimeElapsedDays <- NA
for (i in 1:nrow(df)) {
if (!is.na(df$DT1[i])) {
df$TimeElapsedDays[i] <- difftime(df$DT2[i], df$DT1[i], units = c("days"))
}
}
This is great! Thanks much for the quick and helpful reply.
Might I impose on your kindness to help me understand this? I have the following questions:
Why didn't constructing TimeElapsedDays work via one element at a time? Because, since DT[1] to DT[3] were NA, the first 3 entries of TimeElapsedDays were left open--kind of indeterminate?
Why did mutate work? Or, perhaps, what is special about mutate?
Why is "<-" used sometimes [as in your first solution], vs "=" [as in your second]?
Feel free just to point me at some reading.
Thanks again! Got me unstuck!
this would fail as there is no abc to set the 5th thing of.
you can initialise like
zzz <- NULL
#then
(zzz[5] <-1) # would work
mutate is a function thats been designed to work the way it does, I can't say more than that. Its a tool, if you like what it does, and how it does what it does, use it
<- is always preferred when assigning objects. = is necessary when assigning a value to a parameter name. you could actualy use <- inside the mutate, though it would look weird, and result in a 'bad' name for the column (i.e. the column name would be the expression), mutate encourages the use of = within it. This makes it look like when you pass parameters in a typical function, and is distinct from conventional do it yourself assignment
Welcome! Most functions in R are "vectorized" so you don't have to construct a loop. Mutate isn't special. You can simply subtract one column from another and create a new column at the same time. Plain old subtraction works because there is a subtraction method for datetime objects that creates a difftime object automatically. The 'difftime()` function gives you finer control over the result as you show by changing the interval of the result to days.
> df$diff <- df$DT2 - df$DT1 # That was easy!
> df
# A tibble: 6 x 3
DT1 DT2 diff
<dttm> <dttm> <drtn>
1 NA 2020-01-01 09:00:00 NA hours
2 NA 2020-01-01 09:00:00 NA hours
3 NA 2020-02-01 10:00:00 NA hours
4 2020-01-01 09:00:00 2020-01-01 10:00:00 1.00 hours
5 2020-01-02 09:15:00 2020-01-02 11:00:00 1.75 hours
6 2020-01-03 09:30:00 2020-01-03 12:00:00 2.50 hours
As an aside, I'm not sure why you create a stringsAsFactors column in your data frame. It isn't doing anything. Customarily, you might use options(stringsAsFactors = FALSE) to start your session.