Help on Looping

Please find below the input and the output (coming out of the code you shared). Ideally for the 2nd row it will be 6 and for 3rd row it the sum should be 9 because there are 3 rows which are within the 6 months of the 3/21/2016. The 6 months less of 3/21/2016 is 9/21/2015. All the top 2 rows are within this range. I.e. 3/15/2016 and 3/1/2016. So the code should sum the "nos" which is 4 ,2 and 3.input1 output1

This is what ia m trying but not working

datanew12 = datanew2 %>% group_by(location,grade,Date_Created) %>%
mutate(earliest_date = Date_Created %m-% months(6)) %>%
filter((Date_Created >= earliest_date & Date_Created <= Date_Created)) %>%
summarise(cars_cumulative = sum(nos))

Any suggestion please? thanks

My suggestion is, check the logic of your filtering conditions, have in minde that this works rowwise, so comparing Date_Created <= Date_Created makes no sense since it will always be TRUE.

Thanks. I am trying to replicate the below for loop into the vectorize form which works fine for creating a "count" variable. but it's not working when i transform it into a vectorize form
dataji$count <- 0 # initialise counter
i <- 1
for (i in 1:nrow(datanew1)){
compare <- (dataji$country==dataji$country) &
(dataji$grade==dataji$grade ) &
(dataji$department==dataji$department ) &
(dataji$Date_Created >=dataji$less6month[i])&
(dataji$Date_Created <= dataji$Date_Created[i])
dataji$count[i] <- sum(compare)

}

These variable names don't match the ones in the sample data you have posted before, to help us help you, could you please prepare a self-contained minimal reproducible example (reprex) illustrating your issue? A reprex makes it much easier for others to understand your issue and figure out how to help.
Please have a look at this guide, to see how to create one:

Please find below the data and the code
structure(list(Id = 1:15, nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L,
3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L), location = structure(c(1L, 1L,
1L, 2L, 2L, 2L, 5L, 5L, 1L, 2L, 2L, 6L, 3L, 4L, 4L), .Label = c("Aus",
"Ind", "KA", "LA", "ML", "PA"), class = "factor"), grade = c(3L,
3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L), Date_Created = structure(c(3L,
4L, 5L, 7L, 9L, 9L, 9L, 10L, 1L, 2L, 6L, 6L, 6L, 6L, 8L), .Label = c("10/14/2016",
"10/31/2016", "3/1/2016", "3/15/2016", "3/21/2016", "4/13/2017",
"5/25/2016", "6/29/2017", "7/29/2016", "8/4/2016"), class = "factor")), class = "data.frame", row.names = c(NA,
-15L))

library(lubridate)
data$Date_Created <- as.Date(data$Date_Created, "%m/%d/%Y") # convert to date
data$less6month <- data$Date_Created %m-% months(6) # subtract 6 months

data$count <- 0 # initialise counter
i <- 1 # for testing
for (i in 1:nrow(data)){ # loop through rows

compare all rows to row i

compare <- (data$location==data$location[i]) &
(data$grade==data$grade[i]) &
(data$Date_Created >=data$less6month[i])&
(data$Date_Created <= data$Date_Created[i])

count number of TRUE results

data$count[i] <- sum(compare)

}

Problem : The for loop is running but not working when converted to vectorize form. Also the final output

The code has to create two variables
a) count - how many rows are there in the dataset which satisifies the filter conditions and are within the 6 months of Date_Created variable for that row. I have provided the explanation in the post already and the picture of the final output.
b) Sum - The code has create a new variable called "sum" based on the "nos" column which satisifies the filter conditions. one of the conditions is Given this Date_Created, I want to find the other rows with Date_Created that are within the 6 months of the Date_Created for every row and then sum the "nos"[![enter image description here][1]][1] for those rows. I have provided the picture and the final out put.

The final output should look like
finaloutput

This produces the sum

data <- data.frame(
    Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
           14L, 15L),
    nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L),
    grade = c(3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L),
    Date_Created = c("2016-03-01", "2016-03-15", "2016-03-21", "2016-05-25",
                     "2016-07-29", "2016-07-29", "2016-07-29", "2016-08-04",
                     "2016-10-14", "2016-10-31", "2017-04-13", "2017-04-13",
                     "2017-04-13", "2017-04-13", "2017-06-29"),
    location = as.factor(c("Aus", "Aus", "Aus", "Ind", "Ind", "Ind", "ML",
                           "ML", "Aus", "Ind", "Ind", "PA", "KA", "LA",
                           "LA"))
)

library(tidyverse)
library(lubridate)

data$Date_Created = ymd(data$Date_Created)

data %>%
    mutate(earliest_date = Date_Created %m-% months(6),
           sum_nos = map2_dbl(.x = earliest_date,
                              .y = Date_Created,
                              ~ data %>%
                                  filter(Date_Created >= .x & Date_Created <= .y) %>%
                                  summarise(sum_nos = sum(nos)) %>% 
                                  .$sum_nos
                              ))
#>    Id nos grade Date_Created location earliest_date sum_nos
#> 1   1   4     3   2016-03-01      Aus    2015-09-01       4
#> 2   2   2     3   2016-03-15      Aus    2015-09-15       6
#> 3   3   3     3   2016-03-21      Aus    2015-09-21       9
#> 4   4   2     3   2016-05-25      Ind    2015-11-25      11
#> 5   5   3     3   2016-07-29      Ind    2016-01-29      18
#> 6   6   2     2   2016-07-29      Ind    2016-01-29      18
#> 7   7   2     2   2016-07-29       ML    2016-01-29      18
#> 8   8   3     4   2016-08-04       ML    2016-02-04      21
#> 9   9   1     4   2016-10-14      Aus    2016-04-14      13
#> 10 10   5     3   2016-10-31      Ind    2016-04-30      18
#> 11 11   2     3   2017-04-13      Ind    2016-10-13      12
#> 12 12   1     4   2017-04-13       PA    2016-10-13      12
#> 13 13   2     3   2017-04-13       KA    2016-10-13      12
#> 14 14   1     3   2017-04-13       LA    2016-10-13      12
#> 15 15   1     4   2017-06-29       LA    2016-12-29       7

Created on 2019-07-05 by the reprex package (v0.3.0)

Thank you. May be i was not clear in my communication. For the 4th row the sum should be 2. Because for the location is Ind and there are no rows which falls in the 6 month of the Date_Created.
For e.g for the 4th row the Date_Created is 25-05-2016 and the 6 month earliest is 25-11-2015. But there are now rows in the data which falls satisifies location is India and falls in between.. So the value in the sum column will be just 2.
For e.g for the 5th row the Date_Created is 29-07-2016 and the 6 month earlies is 29-01-2016. In this case 4th and 5th row satisifies the conditions. So the sum will be 2+3 which is 5.

finaloutput

Sorry, you are right, you are not being clear enough and I can't understand what you are saying, I believe I have gave you an example of how to replace your for loop using purrr package, so you should be able to adapt it to fit your conditions (which I can't understand).

Thank you for all the support. Sorry was not being clear. But i did provided all the explanations.

Yes, but they are not clear enough, have in mind that not everybody here is a native English speaker so you have to keep things simple concise and clear if you want to improve your chances of getting help.

I have made a last attempt, is this what you mean?

library(tidyverse)
library(lubridate)

data$Date_Created = ymd(data$Date_Created)

data %>%
    mutate(earliest_date = Date_Created %m-% months(6),
           sum_nos = pmap_dbl(.l = list(earliest_date,
                                        Date_Created,
                                        location),
                              ~ data %>%
                                  filter(location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
                                  summarise(sum_nos = sum(nos)) %>% 
                                  .$sum_nos
                              ))
#>    Id nos grade Date_Created location earliest_date sum_nos
#> 1   1   4     3   2016-03-01      Aus    2015-09-01       4
#> 2   2   2     3   2016-03-15      Aus    2015-09-15       6
#> 3   3   3     3   2016-03-21      Aus    2015-09-21       9
#> 4   4   2     3   2016-05-25      Ind    2015-11-25       2
#> 5   5   3     3   2016-07-29      Ind    2016-01-29       7
#> 6   6   2     2   2016-07-29      Ind    2016-01-29       7
#> 7   7   2     2   2016-07-29       ML    2016-01-29       2
#> 8   8   3     4   2016-08-04       ML    2016-02-04       5
#> 9   9   1     4   2016-10-14      Aus    2016-04-14       1
#> 10 10   5     3   2016-10-31      Ind    2016-04-30      12
#> 11 11   2     3   2017-04-13      Ind    2016-10-13       7
#> 12 12   1     4   2017-04-13       PA    2016-10-13       1
#> 13 13   2     3   2017-04-13       KA    2016-10-13       2
#> 14 14   1     3   2017-04-13       LA    2016-10-13       1
#> 15 15   1     4   2017-06-29       LA    2016-12-29       2
1 Like

Thank you so much . Perfect this is what i was looking for. Sorry i can completely understand your point thats why i try to give example. Will be more clear in future.

  1. But When i run your code i am getting all the values as same. is that something wrong with the R or my code? Please guide.

  2. Also if i would like to another filter variable called Grade. it will be same as like below

data %>%
mutate(earliest_date = Date_Created %m-% months(6),
sum_nos = pmap_dbl(.l = list(earliest_date,
Date_Created,
location,
grade),
~ data %>%
filter(grade==..4,location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
summarise(sum_nos = sum(nos)) %>%
.$sum_nos
))

Have you tried on a clean R session?, maybe is a name conflic with other objects you have loaded in memory.

Yes

  1. Cleared the R session and restarted it but still getting the same result. not sure why.
  2. Also one more help suppose i want to create one more variable "count" which satisfies the above conditions. so the code will be like this?
    sai %>%
    mutate(earliest_date = Date_Created %m-% months(6),
    sum_nos = pmap_dbl(.l = list(earliest_date,
    Date_Created,
    location),
    ~ data %>%
    filter(location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
    Count=n()) %>%
    .$sum_nos
    ))

The below for loops works and creates count variable. but was wondering if the vectorize form is the same as above.
library(lubridate)
data$Date_Created <- as.Date(data$Date_Created, "%m/%d/%Y") # convert to date
data$less6month <- data$Date_Created %m-% months(6) # subtract 6 months

data$count <- 0 # initialise counter
i <- 1 # for testing
for (i in 1:nrow(data)){ # loop through rows

compare all rows to row i

compare <- (datanew1$location==datanew1$location[i]) &
(data$grade==data$grade[i]) &
(data$Date_Created >=data$less6month[i])&
(data$Date_Created <= data$Date_Created[i])

count number of TRUE results (subtract 1 for row i)

data$count[i] <- sum(compare)

}

I can't help you with this without a proper reproducible example, please read this link and try to make one.

This is wrong syntax, if you want to learn how to use this functions then read this free ebook

1 Like

Please find below the data and the reprex. The output still remains the same
sai <- data.frame(
Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L),
nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L),
grade = c(3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L),
Date_Created = c("2016-03-01", "2016-03-15", "2016-03-21", "2016-05-25",
"2016-07-29", "2016-07-29", "2016-07-29", "2016-08-04",
"2016-10-14", "2016-10-31", "2017-04-13", "2017-04-13",
"2017-04-13", "2017-04-13", "2017-06-29"),
location = as.factor(c("Aus", "Aus", "Aus", "Ind", "Ind", "Ind", "ML",
"ML", "Aus", "Ind", "Ind", "PA", "KA", "LA",
"LA"))
)

library(tidyverse)
library(lubridate)

sai$Date_Created = ymd(sai$Date_Created)

sai %>%
mutate(earliest_date = Date_Created %m-% months(6),
sum_nos = map2_dbl(.x = earliest_date,
.y = Date_Created,
~ sai %>%
filter(Date_Created >= .x & Date_Created <= .y) %>%
summarise(sum_nos = sum(nos)) %>%
.$sum_nos
))