How to add a counter to each group in dplyr

dplyr

#1

Hello everyone,

I would like to add a column to my spark data frame that stores which line in its particular group it is.

for example:

Dataset = 'countries' containing (France, France, France, US, France, US, China)

if I do countries %>% group_by(country) %>% mutate( .... )

As a result of these operations I want for example this as output:

China 1
France 1
France 2
France 3
France 4
US 1
US 2

Does anyone know how to implement this?
kind regards,
Charles


#2

Hi,

I think this StackOverflow thread will get you what you're looking for

In the future it's easier to answer a question if you include self-contained reprex (short for reproducible example). For pointers specific to the community site, check out the reprex FAQ.

Edit: Leaving below for posterity, but definitely use @mishabalyasin's with row_number() (which is also what at least one answer suggests in the SO thread, I think). (>įƒš)

library(tidyverse)
mydata <- data_frame(countries = c("France", "US", "France", "France", "China", "China", "US"),
                     number = 1)

mydata %>%
  group_by(countries) %>%
  mutate(ticker = cumsum(number))
#> # A tibble: 7 x 3
#> # Groups:   countries [3]
#>   countries number ticker
#>   <chr>      <dbl>  <dbl>
#> 1 France         1      1
#> 2 US             1      1
#> 3 France         1      2
#> 4 France         1      3
#> 5 China          1      1
#> 6 China          1      2
#> 7 US             1      2

Created on 2018-08-22 by the reprex package (v0.2.0.9000).


#3

Hi! For people to help you it is always worth the effort to create a reproducible example. Here is more info on how to do it:

For your particular case this can be achieved with row_number function in dplyr:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::sample_n(3) %>%
  dplyr::mutate(row_number = dplyr::row_number())
#> # A tibble: 9 x 6
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    row_number
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>           <int>
#> 1          4.3         3            1.1         0.1 setosa              1
#> 2          5           3            1.6         0.2 setosa              2
#> 3          5.1         3.3          1.7         0.5 setosa              3
#> 4          5.5         2.4          3.8         1.1 versicolor          1
#> 5          5.9         3            4.2         1.5 versicolor          2
#> 6          5.7         2.6          3.5         1   versicolor          3
#> 7          6.5         3            5.5         1.8 virginica           1
#> 8          7.2         3.6          6.1         2.5 virginica           2
#> 9          6.7         3.3          5.7         2.1 virginica           3

Created on 2018-08-22 by the reprex package (v0.2.0).