Counting how many times coherent numbers appears in variable

Hi everyone,

I have a large data set that represents the month a given person received unemployment help counting months from 1 to 24 (the data set represents two years). The same person can therefore appear 24 times if the person received unemployment help for 24 months (two years). However a person can also have received unemployment help the first six months and then having a job for a year for then agian to return to unemployment help. The person would then appear 12 times but with a break in between. The dataset is ordered so the numbers appear from minimum to maximum.

I would like to make a dummy that gives the person '1' if the person received unemployment help for 12 months in a row during these two years and 0 if the person has received unemployment help for less than that or with breaks in between, so maybe the person received unemployment help during month 1-6 and then again from 22-23.

Is there any way to do that?

Best,
Frederikke

The data set looks like this:

ID-number Month
343439000 1
343439000 2
343439000 3
343439000 4
343439000 5
343439000 6
343439000 7
343439000 8
343439000 9
343439000 10
343439000 11
343439000 12
343439000 13
343439000 14
343439000 15
343439000 16
222256566 6
222256566 7
222256566 14
222256566 15
666445454 23
666445454 24

Hi @frederikke. You can combine rle and diff to find the consecutive number. The diff give out the lagged differences between two numbers and rle count the length of same lagged differences (in case of consecutive number, lagged difference will be 1).

library(tidyverse)

### generate sample data

numPpl <- 100

ids <- sample(1000000:9999999, numPpl)

df <- map_dfr(ids, ~{
  numMonth <- sample(1:24, 1)
  months <- sample(1:24, numMonth) %>%
    .[order(.)]
  data.frame(`ID-number` = .x, Month = months, stringsAsFactors = FALSE)
})

######################################

df %>%
  group_by(ID.number) %>%
  summarise(rle = list(rle(diff(Month)))) %>%
  rowwise() %>%
  mutate(consecutiveMonth = max(rle$lengths[rle$values == 1], -1) + 1) %>%
  mutate(moreThan12Month = ifelse(consecutiveMonth >= 12, 1, 0))
#> Source: local data frame [100 x 4]
#> Groups: <by row>
#> 
#> # A tibble: 100 x 4
#>    ID.number rle    consecutiveMonth moreThan12Month
#>        <int> <list>            <dbl>           <dbl>
#>  1   1110006 <rle>                 6               0
#>  2   1148835 <rle>                 2               0
#>  3   1170768 <rle>                 4               0
#>  4   1213171 <rle>                10               0
#>  5   1285829 <rle>                15               1
#>  6   1329848 <rle>                 3               0
#>  7   1600573 <rle>                 4               0
#>  8   1651562 <rle>                 3               0
#>  9   1779561 <rle>                17               1
#> 10   1818885 <rle>                 6               0
#> # … with 90 more rows

Created on 2019-10-22 by the reprex package (v0.3.0)

2 Likes

Hi @raytong

So sorry that i haven't replied sooner, but I just now had the time to try out your solution - and it worked! So thank you so much, it was reaaly useful!

/Frederikke

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.