Hi I am trying to summarize the data using dplyr. I have a dataset with few columns the code has to filter the data with date_Created column and see if the conditions are matched within the datasets and output a new column with counts (which matches the filtering condition. Let me give an example
I have columns Date_created, Skills, grade. So the first row is picked in the dataset and checked if there are any similar conditions 6 months before if yes then count 1 is added to the column.
eg. Date_created skill grade
2016-09-01 maths 4
2016-10-01 physics 5
2016-03 -01 maths 5
2016-03-01 maths 4
so the output will be in this case the new column count will be 2 for the first row because date_created 2016-03-01 is less than 2016-09-01 twice and the skill is maths and grade is 4 and 4+1,
date_created sklll grade count
2016-09-01 maths 4 2
2016-10-01 physics 5
2016-03 -01 maths 5
2016-03-01 maths 4
I usually use a helper data frame for such operations, which might be slow with big data sets. Maybe there is a better way how to do this in a single mutate() statement. Anyway this should work:
# Function definition
add_counting <- function(df, date_col, subject_col){
d <- enquo(date_col)
s <- enquo(subject_col)
helper_df <- df %>% mutate(!!d := !!d %m+% months(6)) %>% #add 6 months to the date
group_by(!!d, !!s) %>% summarise(count = n())
df %>% left_join(helper_df, by = c(quo_name(d),quo_name(s)))
}
add_counting(df, test_date, subject)
Thank you for providing the solution. Its helpful. But when i am trying to replicate i am not getting the solution. Please find below the sample data and the code which i am trying.
Actually the code should run through the dataset and check every row and to see the conditions are matched. It should take for example Job ==2 and the Job ==3 ie. Job and Job +1.if yes the count is assigned.
what do you mean that you could not replicate the it? Did the code not work when tried with the data that I gave you or does it not work for the data that you will actually use?
I am sorry, but I am having a hard time understanding what you want to do. I thought that you wanted to check whether there are entries that are exactly 6 months apart and meet the condition "skills". Then theg the count of the most recent entry goes up by 1. Now I am not sure what you actually need.
If you want to follow this up, provide your data in code (i.e. data <- data.frame(skill=c(...) etc.)) as well as the expected output. Then give some clear explanations on which conditions the count of each row should go up by 1.