I have a problem getting summarize() to summarize by group correctly for a series of summaries of the same data.frame. The following is reproducible, but leads to an incorrect result.
testcase <- data.frame(StudyID = sample(LETTERS[1:10], 100,
replace = TRUE),
intercept = rnorm(100), stepchg = rnorm(100))
testcase %>% group_by(StudyID) %>% summarize(
Intcpt=median(intercept,na.rm=T), MAD_Intcpt=mad(intercept,na.rm=T),
stepchg=median(stepchg, na.rm=T), madstepchg=mad(stepchg, na.rm = T))
testcase %>% group_by(StudyID) %>% summarize(
madstepchg=mad(stepchg, na.rm = T))
My results. Note that when I ask for four summaries, the first three are fine, but the last one just gives 0s. If I ask for the last summary by itself, the result is correct. I am not sure whether I have made a dumb mistake, or whether there is a bug in the summarize() code. I don't want to report a bug until I am sure that the problem is not just a stupid mistake on my part. (The original data has sme NAs, so I have to include the "na.rm = T." But the result is the same if I leave that out.
> testcase %>% group_by(StudyID) %>% summarize(
+ Intcpt=median(intercept,na.rm=T), MAD_Intcpt=mad(intercept,na.rm=T),
+ stepchg=median(stepchg, na.rm=T), madstepchg=mad(stepchg, na.rm = T))
# A tibble: 10 x 5
StudyID Intcpt MAD_Intcpt stepchg madstepchg
<fct> <dbl> <dbl> <dbl> <dbl>
1 A -0.392 0.846 -0.118 0
2 B 0.0805 1.51 1.22 0
3 C 0.0362 1.06 -0.585 0
4 D -0.0266 0.410 0.263 0
5 E 0.370 1.66 -0.272 0
6 F -0.324 1.27 -0.181 0
7 G 0.450 0.197 0.240 0
8 H -0.390 0.741 0.0800 0
9 I 0.427 0.536 -0.00189 0
10 J 0.0361 0.637 -0.0393 0
> testcase %>% group_by(StudyID) %>% summarize(
+ madstepchg=mad(stepchg, na.rm = T))
# A tibble: 10 x 2
StudyID madstepchg
<fct> <dbl>
1 A 0.734
2 B 0.863
3 C 0.448
4 D 1.36
5 E 0.777
6 F 0.769
7 G 0.715
8 H 0.757
9 I 0.684
10 J 0.719
Thanks in advance to anyone that can help me fix the above.
Larry Hunsicker