There is no specific api to access grouping information from the C++ side, however it's all stored as attributes of the data frame.
The attributes used to be messy, but as part of this PR we've made it much cleaner, and all the information is stored in a tibble, e.g.
library(dplyr, warn.conflicts = FALSE)
d <- group_by(iris, Species)
# 1-based indices of rows of each group
group_rows(d)
#> [[1]]
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
#> [47] 47 48 49 50
#>
#> [[2]]
#> [1] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
#> [18] 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
#> [35] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
#>
#> [[3]]
#> [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
#> [18] 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
#> [35] 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
# keys or "representatives of each group
group_keys(d)
#> # A tibble: 3 x 1
#> Species
#> <fct>
#> 1 setosa
#> 2 versicolor
#> 3 virginica
# both
group_data(d)
#> # A tibble: 3 x 2
#> Species .rows
#> <fct> <list>
#> 1 setosa <int [50]>
#> 2 versicolor <int [50]>
#> 3 virginica <int [50]>
# it's all stored in the "groups" attribute
# its last column is a list column of indices
attr(d, "groups", exact = TRUE)
#> # A tibble: 3 x 2
#> Species .rows
#> <fct> <list>
#> 1 setosa <int [50]>
#> 2 versicolor <int [50]>
#> 3 virginica <int [50]>
# we can use that information internally to
# e.g. get the size of each group
Rcpp::cppFunction('IntegerVector counts(DataFrame df) {
DataFrame groups(df.attr("groups"));
List rows = groups[groups.size()-1];
int n = groups.nrow();
IntegerVector res(n);
for(int i=0; i<n; i++) {
IntegerVector index = rows[i];
res[i] = index.size();
}
return res;
}')
counts(d)
#> [1] 50 50 50
Created on 2019-02-12 by the reprex package (v0.2.1.9000)