Hi, I have this simple data frame with some duplicates:
data.source <- data.frame(
stringsAsFactors = FALSE,
RegNo = c("ccc","ccc","ddd","ddd",
"aaa","aaa","bbb","bbb","ddd","ddd"),
RegDate = c("2013-10-26 00:00:00",
"2013-10-26 00:00:00","2013-04-30 00:00:00",
"2013-04-30 00:00:00","2007-05-21 00:00:00","2007-05-21 00:00:00",
"2015-03-30 00:00:00","2015-03-30 00:00:00",
"2013-04-30 00:00:00","2013-04-30 00:00:00"),
Mileage = c(37230,37230,38163,38163,
68050,68050,32276,32276,38163,38165),
CWI = c("C", "C", "C", "C", "W", "C", "C", "W", "W", "C"),
RoTotal = c(725.75,725.75,178.71,178.71,
387.66,482.9,239.63,21.43,68.79,322.64),
LabourCost = c(243.19,205.19,110.77,101.37,
157.14,245.74,140.6,21.43,57.14,50),
PartsCost = c(401.34,306.01,19.77,33.34,
230.52,160.16,42.77,0,11.65,272.64),
LastWorkshopDate = c("2018-10-19","2018-10-19",
"2018-04-26","2018-04-26","2018-06-04","2018-06-04",
"2018-03-22","2018-03-22","2018-04-26","2018-06-25"),
SubletCost = c(20, 0, 0, 0, 0, 10, 25, 0, 0, 0)
)
data.source
All variables apart from LabourCost, PartsCost and SubletCost are identical for first two and the second two rows.
I know how to keep unique records only but ...
I need to keep unique records if RegNo RegDate, Mileage, LastWorkshopDate, CWI and RoTotal are the same BUT to have average for three numerical variables which are different. In other words, I want to replace values for duplicated rows 1-2 and 3-4 by means of LabourCost, PartsCost and SubletCost.
I also want the same but with max value for the above (instead of the average).
How can I do that?