Help on Looping

srini · July 5, 2019, 5:16pm

Thank you for all the support. Sorry was not being clear. But i did provided all the explanations.

andresrcs · July 5, 2019, 5:27pm

Yes, but they are not clear enough, have in mind that not everybody here is a native English speaker so you have to keep things simple concise and clear if you want to improve your chances of getting help.

I have made a last attempt, is this what you mean?

library(tidyverse)
library(lubridate)

data$Date_Created = ymd(data$Date_Created)

data %>%
    mutate(earliest_date = Date_Created %m-% months(6),
           sum_nos = pmap_dbl(.l = list(earliest_date,
                                        Date_Created,
                                        location),
                              ~ data %>%
                                  filter(location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
                                  summarise(sum_nos = sum(nos)) %>% 
                                  .$sum_nos
                              ))
#>    Id nos grade Date_Created location earliest_date sum_nos
#> 1   1   4     3   2016-03-01      Aus    2015-09-01       4
#> 2   2   2     3   2016-03-15      Aus    2015-09-15       6
#> 3   3   3     3   2016-03-21      Aus    2015-09-21       9
#> 4   4   2     3   2016-05-25      Ind    2015-11-25       2
#> 5   5   3     3   2016-07-29      Ind    2016-01-29       7
#> 6   6   2     2   2016-07-29      Ind    2016-01-29       7
#> 7   7   2     2   2016-07-29       ML    2016-01-29       2
#> 8   8   3     4   2016-08-04       ML    2016-02-04       5
#> 9   9   1     4   2016-10-14      Aus    2016-04-14       1
#> 10 10   5     3   2016-10-31      Ind    2016-04-30      12
#> 11 11   2     3   2017-04-13      Ind    2016-10-13       7
#> 12 12   1     4   2017-04-13       PA    2016-10-13       1
#> 13 13   2     3   2017-04-13       KA    2016-10-13       2
#> 14 14   1     3   2017-04-13       LA    2016-10-13       1
#> 15 15   1     4   2017-06-29       LA    2016-12-29       2

srini · July 5, 2019, 5:40pm

Thank you so much . Perfect this is what i was looking for. Sorry i can completely understand your point thats why i try to give example. Will be more clear in future.

But When i run your code i am getting all the values as same. is that something wrong with the R or my code? Please guide.

output%20mistake888×419 10.4 KB
Also if i would like to another filter variable called Grade. it will be same as like below

data %>%
mutate(earliest_date = Date_Created %m-% months(6),
sum_nos = pmap_dbl(.l = list(earliest_date,
Date_Created,
location,
grade),
~ data %>%
filter(grade==..4,location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
summarise(sum_nos = sum(nos)) %>%
.$sum_nos
))

andresrcs · July 5, 2019, 5:45pm

Have you tried on a clean R session?, maybe is a name conflic with other objects you have loaded in memory.

Yes

srini · July 5, 2019, 5:59pm

Cleared the R session and restarted it but still getting the same result. not sure why.
Also one more help suppose i want to create one more variable "count" which satisfies the above conditions. so the code will be like this?
sai %>%
mutate(earliest_date = Date_Created %m-% months(6),
sum_nos = pmap_dbl(.l = list(earliest_date,
Date_Created,
location),
~ data %>%
filter(location == ..3,Date_Created >= ..1 & Date_Created <= ..2) %>%
Count=n()) %>%
.$sum_nos
))

The below for loops works and creates count variable. but was wondering if the vectorize form is the same as above.
library(lubridate)
data$Date_Created <- as.Date(data$Date_Created, "%m/%d/%Y") # convert to date
data$less6month <- data$Date_Created %m-% months(6) # subtract 6 months

data$count <- 0 # initialise counter
i <- 1 # for testing
for (i in 1:nrow(data)){ # loop through rows

compare all rows to row i

compare <- (datanew1$location==datanew1$location[i]) &
(data$grade==data$grade[i]) &
(data$Date_Created >=data$less6month[i])&
(data$Date_Created <= data$Date_Created[i])

count number of TRUE results (subtract 1 for row i)

data$count[i] <- sum(compare)

}

andresrcs · July 5, 2019, 6:12pm

I can't help you with this without a proper reproducible example, please read this link and try to make one.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

This is wrong syntax, if you want to learn how to use this functions then read this free ebook

srini · July 5, 2019, 6:19pm

Please find below the data and the reprex. The output still remains the same
sai <- data.frame(
Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L),
nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L),
grade = c(3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L),
Date_Created = c("2016-03-01", "2016-03-15", "2016-03-21", "2016-05-25",
"2016-07-29", "2016-07-29", "2016-07-29", "2016-08-04",
"2016-10-14", "2016-10-31", "2017-04-13", "2017-04-13",
"2017-04-13", "2017-04-13", "2017-06-29"),
location = as.factor(c("Aus", "Aus", "Aus", "Ind", "Ind", "Ind", "ML",
"ML", "Aus", "Ind", "Ind", "PA", "KA", "LA",
"LA"))
)

library(tidyverse)
library(lubridate)

sai$Date_Created = ymd(sai$Date_Created)

sai %>%
mutate(earliest_date = Date_Created %m-% months(6),
sum_nos = map2_dbl(.x = earliest_date,
.y = Date_Created,
~ sai %>%
filter(Date_Created >= .x & Date_Created <= .y) %>%
summarise(sum_nos = sum(nos)) %>%
.$sum_nos
))

srini · July 5, 2019, 6:21pm

Thank you for the book. Sure will read it.

andresrcs · July 5, 2019, 6:22pm

That works normally for me, what happens if you run this code through the reprex package?

reprex::reprex({
    sai <- data.frame(
        Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
               14L, 15L),
        nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L),
        grade = c(3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L),
        Date_Created = c("2016-03-01", "2016-03-15", "2016-03-21", "2016-05-25",
                         "2016-07-29", "2016-07-29", "2016-07-29", "2016-08-04",
                         "2016-10-14", "2016-10-31", "2017-04-13", "2017-04-13",
                         "2017-04-13", "2017-04-13", "2017-06-29"),
        location = as.factor(c("Aus", "Aus", "Aus", "Ind", "Ind", "Ind", "ML",
                               "ML", "Aus", "Ind", "Ind", "PA", "KA", "LA",
                               "LA"))
    )
    
    library(tidyverse)
    library(lubridate)
    
    sai$Date_Created = ymd(sai$Date_Created)
    
    sai %>%
        mutate(earliest_date = Date_Created %m-% months(6),
               sum_nos = map2_dbl(.x = earliest_date,
                                  .y = Date_Created,
                                  ~ sai %>%
                                      filter(Date_Created >= .x & Date_Created <= .y) %>%
                                      summarise(sum_nos = sum(nos)) %>%
                                      .$sum_nos
               ))
})

srini · July 5, 2019, 6:27pm

Its gives the below error if i run with reprex error%20reprex

srini · July 5, 2019, 6:29pm

[quote="srini, post:47, topic:33553"]
Need one more help how can i transform this for loop into the vectorize form

ibrary(lubridate)
data$Date_Created <- as.Date(data$Date_Created, "%m/%d/%Y") # convert to date
data$less6month <- data$Date_Created %m-% months(6) # subtract 6 months

data$count <- 0 # initialise counter
i <- 1 # for testing
for (i in 1:nrow(data)){ # loop through rows

compare all rows to row i

compare <- (datanew1$location==datanew1$location[i]) &
(data$grade==data$grade[i]) &
(data$Date_Created >=data$less6month[i])&
(data$Date_Created <= data$Date_Created[i])

count number of TRUE results (subtract 1 for row i)

data$count[i] <- sum(compare)

}

andresrcs · July 5, 2019, 6:58pm

I'm going to give you the solution, but I want to ask you for something in return, please read and follow this guides before writing your next post/topic

library(tidyverse)
library(lubridate)

data <- data.frame(
    Id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
           14L, 15L),
    nos = c(4L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 5L, 2L, 1L, 2L, 1L, 1L),
    grade = c(3L, 3L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 3L, 3L, 4L, 3L, 3L, 4L),
    Date_Created = c("2016-03-01", "2016-03-15", "2016-03-21", "2016-05-25",
                     "2016-07-29", "2016-07-29", "2016-07-29", "2016-08-04",
                     "2016-10-14", "2016-10-31", "2017-04-13", "2017-04-13",
                     "2017-04-13", "2017-04-13", "2017-06-29"),
    location = as.factor(c("Aus", "Aus", "Aus", "Ind", "Ind", "Ind", "ML",
                           "ML", "Aus", "Ind", "Ind", "PA", "KA", "LA",
                           "LA"))
)

data$Date_Created = ymd(data$Date_Created)

data %>%
    mutate(earliest_date = Date_Created %m-% months(6),
           count = pmap_dbl(.l = list(location,
                                      grade,
                                      earliest_date,
                                      Date_Created),
                            ~ data %>%
                                filter(location == ..1, grade == ..2, Date_Created >= ..3 & Date_Created <= ..4) %>%
                                summarise(count = n()) %>% 
                                .$count
           ),
           sum_nos = pmap_dbl(.l = list(location,
                                        grade,
                                        earliest_date,
                                        Date_Created),
                              ~ data %>%
                                  filter(location == ..1, grade == ..2, Date_Created >= ..3 & Date_Created <= ..4) %>%
                                  summarise(sum_nos = sum(nos)) %>% 
                                  .$sum_nos
                              )
           )
#>    Id nos grade Date_Created location earliest_date count sum_nos
#> 1   1   4     3   2016-03-01      Aus    2015-09-01     1       4
#> 2   2   2     3   2016-03-15      Aus    2015-09-15     2       6
#> 3   3   3     3   2016-03-21      Aus    2015-09-21     3       9
#> 4   4   2     3   2016-05-25      Ind    2015-11-25     1       2
#> 5   5   3     3   2016-07-29      Ind    2016-01-29     2       5
#> 6   6   2     2   2016-07-29      Ind    2016-01-29     1       2
#> 7   7   2     2   2016-07-29       ML    2016-01-29     1       2
#> 8   8   3     4   2016-08-04       ML    2016-02-04     1       3
#> 9   9   1     4   2016-10-14      Aus    2016-04-14     1       1
#> 10 10   5     3   2016-10-31      Ind    2016-04-30     3      10
#> 11 11   2     3   2017-04-13      Ind    2016-10-13     2       7
#> 12 12   1     4   2017-04-13       PA    2016-10-13     1       1
#> 13 13   2     3   2017-04-13       KA    2016-10-13     1       2
#> 14 14   1     3   2017-04-13       LA    2016-10-13     1       1
#> 15 15   1     4   2017-06-29       LA    2016-12-29     1       1

srini · July 6, 2019, 3:32am

Thank you so much for all the help. This is perfect what i was looking for. Kudos I make sure that next time i post i will follow the guidelines.

srini · July 6, 2019, 3:43am

One more help if you can suggest the resources in R to improve my coding skills. Thank you

andresrcs · July 6, 2019, 4:35am

These are my favorite books for getting anybody up and running with R (and they are free)

I also recommend this online course

srini · July 6, 2019, 6:42am

Thank you so much.!!

srini · July 8, 2019, 8:29am

Thank you for the solution. This was the exactly what i was working through. I replicated the exact code and the data. Unfortunately the code is not giving the exact result you posted. I cleared the R session and ran again.

srini · July 8, 2019, 10:06am

I am getting the following messages while running the package. is this could be a problem?

andresrcs · July 8, 2019, 12:35pm

That is just telling you that another package has a function with the same name, so if you want to use for example the plyr version of arrange() then you would have to specify the source like this plyr::arrange()

srini · July 8, 2019, 12:53pm

ok. Thank you. But still not sure why the code is not executing. There are two problems.
If the code executes it gives two problems

The output has the same values across all the rows.
The code is not running.