how to count months observations based on every person?

I know my question is unclear. So let me state my question clearly. My data set is a panel. I have a lot of person (id). And I also have a month variable (month).

Some people have the complete 1-12 month. However, some people only have part of months. My problem is that I only keep those who have complete 1-12 months.


library(dplyr)

your_data %>% group_by(id) %>% summarise(month_count=n_distinct(month)) %>% filter(month_count==12)

Please be aware that a REPRoducible EXample (reprex) is much more useful than a textual description of your problem and it is also a polite thing to do when asking people to help you with coding questions.

okay, is this the way?

data.frame(
          id = c(1202L,1202L,1202L,1202L,1202L,1202L,
                 1202L,1202L,1202L,1202L,1202L,1202L,1202L,1202L,1202L,
                 1202L,1202L),
   bill.date = as.factor(c("2011-01-01",
                           "2011-02-01","2011-03-04","2011-04-03","2011-05-02",
                           "2011-06-01","2011-07-01","2011-07-31","2011-08-30",
                           "2011-09-30","2011-10-29","2011-11-29","2013-02-01",
                           "2013-03-03","2013-04-01","2013-04-30",
                           "2013-06-02"))
)

Thanks. I upload my data. And I need to keep who (id) have complete 12 months in 2011. In my data set, each id has observations both in 2011 and in 2013. There's no restriction for year 2013, only strict restrictions for year 2011.

I did this to create 2 new variables: month and year. And the next thing I need to do is to keep only those id who have 12 months observations in year 2011. In other words, if id only has 11 months in year 2013, that's okay and should be kept as long as this id has 12 months in year 2011.

newdata <- mutate(newdata, month=month(bill.date))
newdata <- mutate(newdata, year=year(bill.date))
newdata <- data.frame(
  id = c(
    1202L, 1202L, 1202L, 1202L, 1202L, 1202L,
    1202L, 1202L, 1202L, 1202L, 1202L, 1202L, 1202L, 1202L, 1202L,
    1202L, 1202L
  ),
  bill.date = as.factor(c(
    "2011-01-01",
    "2011-02-01", "2011-03-04", "2011-04-03", "2011-05-02",
    "2011-06-01", "2011-07-01", "2011-07-31", "2011-08-30",
    "2011-09-30", "2011-10-29", "2011-11-29", "2013-02-01",
    "2013-03-03", "2013-04-01", "2013-04-30",
    "2013-06-02"
  ))
)
library(tidyverse)
library(lubridate)
newdata <- mutate(newdata, month = month(bill.date))
newdata <- mutate(newdata, year = year(bill.date))

ids_with_12_in_2011 <- filter(
  newdata,
  year == 2011
) %>%
  group_by(id) %>%
  count() %>%
  filter(n == 12) %>%
  select(id)

ids_with_12_in_2011_with_data <- left_join(
  ids_with_12_in_2011,
  newdata
)

Your codes looks wrong. For example, id 1202 doesn't have 2011-12. Thus, 1202 should be dropped.

I think there can be multiple entries per month, which is making it trickier.

ok, thanks for pointing it out, I assumed too much originally.
its a small tweak :>


ids_with_12_in_2011 <- filter(
  newdata,
  year == 2011
) %>%
  select(-bill.date) %>%
  distinct %>%
  group_by(id) %>%
  count() %>%
  filter(n == 12) %>%
  select(id)

ids_with_12_in_2011_with_data <- left_join(
  ids_with_12_in_2011,
  newdata
)
1 Like

You make a good point. But let's assume only one entry per month-year.

In that case, @nirgrahamuk's first solution works already.

why do I still see 1202? id 1202 just have 11 months in year 2011.

what do you mean you still see it ?
the content of ids_with_12_in_2011 is now empty as you require it to be given the example data provided...

Your posted data has July 2011 appear twice.

Aha, you're right. I miss this point.

1 Like

Can you explain your code? I can understand first we filter the data which year == 2011, then what?

Here is split out the multipipe statement into single steps for you to analyse:

library(tidyverse)
library(lubridate)
newdata <- mutate(newdata, month = month(bill.date))
newdata <- mutate(newdata, year = year(bill.date))

a <- filter(
  newdata,
  year == 2011
)
a
b<- a%>%
  select(-bill.date)
b
c <- b %>%
  distinct 
c
d <- c %>%
  group_by(id) 
d
e<- d %>%
  count() 
e
f <- e %>%
  filter(n == 12) 
f
g <-f %>%
  select(id)
g

ids_with_12_in_2011_with_data <- left_join(
  g,
  newdata
)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.