Fix duplicate information in R

I'd like to know why I'm getting my output with duplicate values. Notice that I have twice values ​​for Wednesday, Thursday and Friday. How can I fix this?

Executable code below:

library(dplyr)
library(tidyverse)


df1<-structure(list(Id = c(1, 1, 1, 1, 1, 1), date1 = c("2021-06-28", 
"2021-06-28", "2021-06-28", "2021-06-28", "2021-06-28", "2021-06-28"), 
date2 = c("2021-06-18","2021-06-19", "2021-06-20", "2021-06-25", "2021-06-26", "2021-06-27"), 
Week = c("Wednesday","Thursday", "Friday", "Wednesday","Thursday", "Friday"), 
DT = c(1, NA_character_, NA_character_,1, NA_character_, NA_character_), Category = c("AB","CD", "EF", "AB", "CD", "EF"), 
Time = c(2, 4, 2, 4, 3, 4)), row.names = c(NA, -6L), class = c("tbl_df","tbl", "data.frame"))

f2 <- function(df1,idd,ds,codagr,dt) {
  
  nms <- c('Time|time')
  
  mtime <- df1 %>% mutate(DT = replace_na(DT, "")) %>% 
    filter(Id==idd,Week == ds, Category == codagr,DT == dt) %>% 
    group_by(Id,Week,Category,DT) %>% 
    summarise(across(matches(nms), mean, .names = 'Time',na.rm = TRUE), .groups = 'keep') %>% 
    mutate(Time = format(round(Time, digits = 2), nsmall = 2))
  
  return(mtime)
}


df1 %>% mutate(DT = replace_na(DT, "")) %>% 
  rowwise %>% 
  mutate(f2(df1,Id,Week,Category, DT))%>%
 select(-c(date1, date2))%>%data.frame()

  Id      Week DT Category Time
1  1 Wednesday  1       AB 3.00
2  1  Thursday          CD 3.50
3  1    Friday          EF 3.00
4  1 Wednesday  1       AB 3.00
5  1  Thursday          CD 3.50
6  1    Friday          EF 3.00

Why shouldnt you have repeated data in your output, after all you have repeated data in your input ?

Thanks for the answer @nirgrahamuk . Did I repeat data in input?

I have the following df1 database

# A tibble: 6 x 7
     Id date1      date2      Week      DT    Category  Time
  <dbl> <chr>      <chr>      <chr>     <chr> <chr>    <dbl>
1     1 2021-06-28 2021-06-18 Wednesday 1     AB           2
2     1 2021-06-28 2021-06-19 Thursday  NA    CD           4
3     1 2021-06-28 2021-06-20 Friday    NA    EF           2
4     1 2021-06-28 2021-06-25 Wednesday 1     AB           4
5     1 2021-06-28 2021-06-26 Thursday  NA    CD           3
6     1 2021-06-28 2021-06-27 Friday    NA    EF           4

Note that the first line is different from the fourth line; date2 in the first line is 2021-06-18 and in the fourth line is 2021-06-25, that is, they are different days. What is done is an mean between the values ​​of the Time column of these rows, because it is grouped according to Id, Week, DT and Category. That is, averaging these lines gives (2+4)/2 = 3, which is the result I have in output for this Id, Week, DT and Category- This is OK. But why does it generate repeated values? What does it take not to generate?

Its possibly you have confused yourself by your choice of names, its hard to tell.
you have used df1 as a table containing input information.
You also use it as a parameter to f2, but seemingly just as a way to reference the original entire input file
finally, you rowwise iterate through all of df1 to apply each of its rows as parameters in 6 calls to f2

here is the first call to f2

df1 %>% slice(1) %>% mutate(DT = replace_na(DT, "")) %>% 
  mutate(f2(df1,Id,Week,Category, DT))%>%
  select(-c(date1, date2))%>%data.frame()

1  1 Wednesday  1       AB 3.00

the 4th

df1 %>% slice(4) %>% mutate(DT = replace_na(DT, "")) %>% 
  mutate(f2(df1,Id,Week,Category, DT))%>%
  select(-c(date1, date2))%>%data.frame()

 Id      Week DT Category Time
1  1 Wednesday  1       AB 3.00

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.