Numbering different groups

Hi!

Hopeful someone understand my question..and my english :grimacing:

How can I number different episodes?
I want that when collums "personID" and "time" are same in different rows, they get same number to new variable "episodenumber". Data is huge, thousands rows.

Example:
personID time episodenumber
wx 15.6.2020 1
wx 15.6.2020 1
aaa 1.7.2019 2
aaa 20.8.2021 3
aaa 20.8.2021 3
oiy 19.12.2020 4

Thank you!

Saija

Here is one method that numbers the episodes in alphabetical order. It uses the fact that factors are stored as integers.

DF <- data.frame(personID=c("wx","wx","aaa","aaa","aaa","oiy"),
+                  time = as.Date("2020-06-15", "2020-06-15", "2019-07-01",
+                                 "2021-08-20", "2021-08-20", "2020-12-19"))
DF
  personID       time
1       wx 2021-08-16
2       wx 2021-08-16
3      aaa 2021-08-16
4      aaa 2021-08-16
5      aaa 2021-08-16
6      oiy 2021-08-16
library(tidyr)
DF <- DF %>% unite(col =Episode, personID:time, remove = FALSE)
DF
         Episode personID       time
1  wx_2021-08-16       wx 2021-08-16
2  wx_2021-08-16       wx 2021-08-16
3 aaa_2021-08-16      aaa 2021-08-16
4 aaa_2021-08-16      aaa 2021-08-16
5 aaa_2021-08-16      aaa 2021-08-16
6 oiy_2021-08-16      oiy 2021-08-16
DF <- DF %>% mutate(Episode = factor(Episode),
                    Episode = as.numeric(Episode))
DF
  Episode personID       time
1       3       wx 2021-08-16
2       3       wx 2021-08-16
3       1      aaa 2021-08-16
4       1      aaa 2021-08-16
5       1      aaa 2021-08-16
6       2      oiy 2021-08-16

And here another method:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(personID=c("wx","wx","aaa","aaa","aaa","oiy"),
                   time = as.Date(lubridate::parse_date_time(c("2020-06-15", "2020-06-15", "2019-07-01",
                                  "2021-08-20", "2021-08-20", "2020-12-19"),"%Y-%m-%d"))
)
DF
#>   personID       time
#> 1       wx 2020-06-15
#> 2       wx 2020-06-15
#> 3      aaa 2019-07-01
#> 4      aaa 2021-08-20
#> 5      aaa 2021-08-20
#> 6      oiy 2020-12-19
DF %>%
  group_by(personID,time) %>%
  mutate (episodenumber=cur_group_id()) %>%
  ungroup()
#> # A tibble: 6 x 3
#>   personID time       episodenumber
#>   <chr>    <date>             <int>
#> 1 wx       2020-06-15             4
#> 2 wx       2020-06-15             4
#> 3 aaa      2019-07-01             1
#> 4 aaa      2021-08-20             2
#> 5 aaa      2021-08-20             2
#> 6 oiy      2020-12-19             3
Created on 2021-08-16 by the reprex package (v2.0.0)
1 Like

Thanks! This work! :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.