Hi, I have this sample file with errors in Model name.
source <- data.frame(
stringsAsFactors = FALSE,
DealerName = c("aaa","aaa","aaa","aaa",
"aaa","aaa","aaa","aaa","aaa","aaa"),
VSCode = c(8007,8007,8007,8007,8007,
8007,8007,8007,8007,8007),
AccountNumber = c(109, 109, 109, 109, 109, 109, 109, 109, 109, 109),
Model = c("aaa","aaa2","aaa3","fff",
"fff2","bbb","bbb","ccc","ccc","ddd"),
RegNo = c("x32PF","x32PF","x32PF",
"z3PFA","z3PFA","xxx","xxx","yyy","yyy","zzz"),
RegDate = c("2021-03-05","2021-03-05",
"2021-03-05","2021-06-02","2021-06-02","2015-11-20",
"2015-11-20","2017-03-03","2017-03-03","2003-03-28"),
Mileage = c(3645,3645,3645,11164,
15907,15571,15571,20376,20376,35574),
PreviousVisitDate = c(NA,NA,NA,NA,"2022-05-11",
NA,NA,"2021-03-04","2021-03-04",NA),
LastWorkshopDate = c("2021-11-05","2021-11-05",
"2021-11-05","2022-05-11","2022-10-24","2018-01-24",
"2018-01-24","2022-03-03","2022-03-04","2021-05-28"),
CWI = c("W", "W", "W", "C", "C", "C", "C", "C", "C", "W"),
RoTotal = c(293.15,293.15,293.15,162.6,
171.49,435.68,435.68,210.25,33.29,82.44),
LabourCost = c(11,30.24,30.24,94.5,51.28,
294.99,294.99,137.29,0,37.42),
PartsCost = c(50,100,0,15.6,120.21,
140.69,140.69,20.46,0,45.02),
SubletCost = c(100, 0, 0, 0, 0, 0, 0, 0, 24.99, 0),
MOTFlag = c(FALSE,FALSE,FALSE,FALSE,
FALSE,FALSE,FALSE,TRUE,FALSE,FALSE)
)
Is it possible to remove duplicates this way:
- Group by RegNo
- If all non-numerical values apart from Model are the same
a) keep the first model name
b) keep all other non-numerical ones - because they are the same
c) calculate means for numerical values
In this example aaa2 and aaa3 should be removed and aaa details with averages for numerical values should be kept, one bbb should be removed and averages for numerical values should be kept
Is it easy to do without specifying to many variables in the code?