Create a for loop on rows of a data frame

Hi everybody,

I have this kind of dataset (see image). I would like for the first appearance of "1" in the "value" column (for each individual) to put the date on the same line in a new column named "date_debut". For the last appearance of "1", I would like to do the same and put the date on the same line in a column named "end_date".

library(datapasta)
dpasta(data_indiv)

data.frame(stringsAsFactors = FALSE, NA, row.names = c("1", "2", "3", "4", "5", "6" , "7", "8","9","10", "11","12","13","14","15", "16","17", "18","19","20", "21","22", "23","24","25", "26","27", "28","29"), date = c("2015-05-08 18:00:00", "2015-05-09 00:00:00",  "2015-05-09 06:00:00", "2015-05-09 12:00:00",  "2015-05-09 18:00:00",  "2015-05-10 00:00:00", "2015-05-10 06:00:00", "2015-05-10 12:00:00", "2015-05-10 18:00:00", "2019-05-26 00:00:00", "2019-05-26 06:00:00",   "2019-05-26 12:00:00", "2019-05-26 18:00:00", "2019-05-27 00:00:00", "2019-05-27 06:00:00",  "2019-05-27 12:00:00", "2019-05-27 18:00:00", "2019-05-28 00:00:00", "2019-05-28 06:00:00", "2019-05-28 12:00:00", "2019-05-28 18:00:00", "2019-05-29 00:00:00", "2019-05-29 06:00:00", "2019-05-29 12:00:00", "2019-05-29 18:00:00", "2019-05-30 00:00:00", "2019-05-30 06:00:00", "2019-05-30 18:00:00", "2019-05-31 00:00:00"), indiv = c("rhi", "rhi","rhi","rhi", "rhi","rhi","rhi", "rhi","rhi","gir", "gir","gir", "gir","gir","gir", "gir","gir","gir", "gir","gir","gir", "gir","gir","gir", "gir","gir","gir", "gir","gir"), value = c(0,0,0,1,1,1,1,1,0,0, 0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0))

How could I modify this code to make this work ? I know there are problems with i - 1 and i + 1 because it is not possible to run this loop for the first and last line... But I cannot manage to make this work by changing the range.

for (i in 1:nrow(data_indiv)){
    if (!is.na(data_indiv$value[i]) == 1 & !is.na(data_indiv$value[i-1]) == 0){
        data_indiv$debut_date[i] <- data_indiv$date[i]
    }
    else if (!is.na(data_indiv$value[i]) == 1 & !is.na(data_indiv$value[i+1]) == 0){
        data_indiv$end_date[i] <- data_indiv$date[i]
    }
}

Have a nice weekend!

Would you consider a more functional approach avoiding for-loops all together ?


library(tidyverse)
(dd <- data_indiv %>% filter(value == 1) %>%
  group_by(indiv, value) %>% summarise(
    debut_date = min(date),
    date = debut_date
  ))

(result <- data_indiv %>% left_join(dd))
2 Likes

Or, alternatively:

library(dplyr)
data_indiv %>%
  group_by(indiv) %>%
  mutate(date_debut = if_else(value == 1 &  lag(value, default = 0) == 0, date, NA_real_),
         end_date   = if_else(value == 1 & lead(value, default = 0) == 0, date, NA_real_)) %>%
  ungroup()

This says that within each indiv we want to make date debut if value is 1 and was 0 just prior (I'm assuming that only happens once in the data for each indiv, but we could adjust if that's uncertain), and mark the end date if the value is 1 and next value is 0.

2 Likes

Thank you very much for your help ! :smiley:

However, I forgot to mention in my first post that this change from 0 to 1 can happen multiple times for each individual. That is why I first thought about doing a loop.

Is it possible to add the function difftime in mutate to compute the difference between both dates?

df %>%
  group_by(indiv, value) %>%
  mutate(
    date_debut = if_else(
      value == 1 & row_number() == 1,
      date, NA_character_),
    end_date = if_else(
      value == 1 & row_number() == n(),
      date, NA_character_)
  )

The logic here is to separate out into groups by individual and value. If a row has value 1 and is the first row like that for the person, then it gets date_debut assigned. If a row has value 1 and is last row like that for the person (ie row_number() within that group is n(), ie last row), then end_date is assigned.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.