[I asked this on R-help yesterday and got a lovely, terse, way to do what I want in base R, neater than I could have coded but which, less well, I could have coded! I am pretty sure I won't get an answer there and my questions are really about dplyr and tidyverse so I'm bringing it here.]
I am sure the answer is "yes" and I'm also sure the question may sound mad. Here's a reprex that I think captures what I'm doing
n <- 500
gender <- sample(c("Man","Woman","Other"), n, replace = TRUE)
GPC_score <- rnorm(n)
scaleMeasures <- runif(n)
bind_cols(gender = gender,
GPC_score = GPC_score,
scaleMeasures = scaleMeasures) -> tibUse
### let's have the correlation between the two variables broken down by gender
tibUse %>%
filter(gender != "Other") %>%
select(gender, GPC_score, scaleMeasures) %>%
na.omit() %>%
group_by(gender) %>%
summarise(cor = cor(cur_data())[1,2]) -> tmp1
### but I'd also like the correlation for the whole dataset, not by gender
### this is a kludge to achieve that which I am using partly because I cant'
### find the equivalent of cur_data() for an ungrouped tibble/df
tibUse %>%
mutate(gender = "All") %>% # nasty kludge to get all the data!
select(gender, GPC_score, scaleMeasures) %>%
na.omit() %>%
group_by(gender) %>% # ditto!
summarise(cor = cor(cur_data())[1,2]) -> tmp2
bind_rows(tmp1,
tmp2)
### gets me what I want:
# A tibble: 3 x 2
gender cor
<chr> <dbl>
1 Man 0.0225
2 Woman 0.0685
3 All 0.0444
In reality I have some functions that are more complex than cor()[2,1] (sorry about that particular kludge) that digest dataframes and I'd love to have a simpler way of doing this.
So two questions:
I am sure there a term/function that works on an ungrouped tibble in dplyr as cur_data() does for a grouped tibble ... but I can't find it.
I suspect someone has automated a way to get the analysis of the complete data after the analyses of the groups within a single dplyr run ... it seems an obvious and common use case, but I can't find that either.
Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but that's the recurrent problem I have with my wetware and searchware doesn't seem to being fixing this!
And for what it's worth, I'm fine with this solution (or what I'm providing below), though you specifically want to avoid gender = "All", if I understand correctly:
(map() returns a list, so we can store the whole of cur_data() in a column)
The reason why your call to cor() fails without grouping is that the grouping variable is excluded from cur_data(). So, in your example, you have a data frame with 3 columns (gender, and 2 numerical). Since gender is the grouping variable, cur_data() gives you a subset of the data frame with 2 (numerical) columns, cor() runs. Without grouping, you are inputting a data.frame with 3 columns into cor(), one of them non-numerical, cor() doesn't know what to do with it. This can be solved simply with a select(-gender).
As for 2, I'm not sure it's possible in the standard dplyr approach (but waiting to be proven wrong [EDIT: and wrong I was, thank you Yarnabrina]): inside a single summarize() you are either using groups or not, and after the summarize() the original data has been discarded. Perhaps if you do not group the data.frame and use map(c("Man","Woman"), my_function, cur_data()) to do the grouping "manually". But I wouldn't recommend, it makes your intention less obvious.