How to groupby values in R for the same value?

How do I group the patients with the same patient id to see how many times they have visited and how frequent they visit?

Take a look into group_by() and summarize() functions from dplyr package.

We could give you more specific help if you turn this into a reproducible example, if you never heard of it before, here is how to do it

I did grouby using patient id but didn't get the required result. I want to count the number of values under each patient_id and then track the check_in time to see if the patient visits again within the next seven days.

In order to help you further, we need a reproducible example on your part, or at least some sample data on a copy friendly format check this blog post for instructions on how to do it.

Here is the sample data

Yes, but is not on a copy/paste friendly format, so I can't copy it into my rstudio session and solve your problem.

With non reproducible questions, more often than not, you are going to get very general answers like "use this function from this package".

that's not data. That's a picture of your data.

Here's a group_by with a count in a reprex:


library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2
set.seed(42)
df <- tibble(id = sample(c("a","b","c"), 10, replace=TRUE))

df %>% 
  group_by(id) %>%
  summarize(rec_count = n())
#> # A tibble: 3 x 2
#>   id    rec_count
#>   <chr>     <int>
#> 1 a             2
#> 2 b             3
#> 3 c             5

Created on 2019-01-27 by the reprex package (v0.2.1)

This syntax is presented in the text @andresrcs mentioned: https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise

Section 5.6.1

2 Likes

Sorry I'm not able to send a reprex as I have an older version of R. Would be really helpful if you could help.

datapasta works with R versions as older as R (≥ 3.3.0), so I seriously doubt that you can't paste some correctly formatted sample data.

image

What about a csv file?

write.csv(sample_data_frame, "sample_data.csv")

you're running a 3+ year old version of R. You're going to run into all sorts of compatibility problems. Do you have a compelling reason for running an outdated version?

2 Likes

Maybe this example with made up data could help you as a starting point

library(dplyr)

sample_data <- tibble(id = sample(c("a","b","c"), 20, replace=TRUE),
                      date = sample(seq.Date(from = as.Date('2019-01-01'), to = as.Date('2019-12-31'), by = 1), 20, replace = TRUE)
)

sample_data %>% 
    group_by(id) %>% 
    summarise(visit_count = n())
#> # A tibble: 3 x 2
#>   id    visit_count
#>   <chr>       <int>
#> 1 a               7
#> 2 b               6
#> 3 c               7

sample_data %>%
    arrange(id, date) %>%
    group_by(id) %>% 
    mutate(previous_date = lag(date), days_count = date - previous_date) %>% 
    filter(days_count <= 20)
#> # A tibble: 6 x 4
#> # Groups:   id [3]
#>   id    date       previous_date days_count
#>   <chr> <date>     <date>        <time>    
#> 1 a     2019-02-14 2019-02-06     8 days   
#> 2 a     2019-05-06 2019-04-23    13 days   
#> 3 a     2019-09-27 2019-09-20     7 days   
#> 4 b     2019-08-14 2019-08-12     2 days   
#> 5 b     2019-12-22 2019-12-14     8 days   
#> 6 c     2019-05-24 2019-05-20     4 days
1 Like
samp <- tibble::tribble(
           ~FACILITY, ~DEPARTMENT, ~PROVIDER_ID, ~APPOINTMENT_DATE, ~APPOINTMENT_TIME, ~PATIENT_ID, ~APPOINTMENT_TYPE, ~BOOKING_DATE, ~BOOKING_TIME, ~SHOW_CODE, ~CHECKIN_DATE, ~CHECKIN_TIME,
               "SCH",       "MED",     67350046,      "04-12-2009",        "17:30:00",    12210741, "Telephone Visit",  "04-12-2009",    "15:05:00",        "Y",  "04-12-2009",    "17:30:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "17:40:00",     5491175, "Telephone Visit",  "04-12-2009",    "16:15:00",        "Y",  "04-12-2009",    "17:40:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "17:50:00",    15740599, "Telephone Visit",  "04-12-2009",    "16:43:00",        "Y",  "04-12-2009",    "17:50:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "18:00:00",    80420422, "Telephone Visit",  "04-12-2009",    "16:53:00",        "Y",  "04-12-2009",    "18:00:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "18:10:00",    86030202, "Telephone Visit",  "04-12-2009",    "16:57:00",        "Y",  "04-12-2009",    "18:10:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "18:20:00",    68651098, "Telephone Visit",  "04-12-2009",    "17:14:00",        "Y",  "04-12-2009",    "18:20:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "18:30:00",    91221095, "Telephone Visit",  "04-12-2009",    "17:01:00",        "Y",  "04-12-2009",    "18:30:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "18:50:00",    45771319, "Telephone Visit",  "04-12-2009",    "17:16:00",        "Y",  "04-12-2009",    "18:50:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "19:00:00",    63531063, "Telephone Visit",  "04-12-2009",    "17:25:00",        "Y",  "04-12-2009",    "19:00:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "19:10:00",    96011310, "Telephone Visit",  "04-12-2009",    "17:47:00",        "Y",  "04-12-2009",    "19:10:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "19:20:00",     1071301, "Telephone Visit",  "04-12-2009",    "17:54:00",        "Y",  "04-12-2009",    "19:20:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "19:30:00",    28841329, "Telephone Visit",  "04-12-2009",    "18:21:00",        "Y",  "04-12-2009",    "19:30:00",
               "SCH",       "MED",     67350046,      "04-12-2009",        "19:40:00",    95001285, "Telephone Visit",  "04-12-2009",    "18:40:00",        "Y",  "04-12-2009",    "19:40:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "17:30:00",    43170522, "Telephone Visit",  "08-12-2009",    "15:12:00",        "Y",  "08-12-2009",    "17:30:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "17:40:00",    85241162, "Telephone Visit",  "08-12-2009",    "17:38:00",        "Y",  "08-12-2009",    "17:47:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "18:00:00",     3340506, "Telephone Visit",  "08-12-2009",    "17:58:00",        "Y",  "08-12-2009",    "18:00:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "18:20:00",    76670349, "Telephone Visit",  "08-12-2009",    "17:49:00",        "Y",  "08-12-2009",    "18:20:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "18:30:00",    62731301, "Telephone Visit",  "08-12-2009",    "17:56:00",        "Y",  "08-12-2009",    "18:30:00",
               "SCH",       "MED",     67350046,      "08-12-2009",        "19:00:00",     3731365, "Telephone Visit",  "08-12-2009",    "18:07:00",        "Y",  "08-12-2009",    "19:00:00"
           )
 head(samp)
#> # A tibble: 6 x 12
#>   FACILITY DEPARTMENT PROVIDER_ID APPOINTMENT_DATE APPOINTMENT_TIME
#>   <chr>    <chr>            <dbl> <chr>            <chr>           
#> 1 SCH      MED           67350046 04-12-2009       17:30:00        
#> 2 SCH      MED           67350046 04-12-2009       17:40:00        
#> 3 SCH      MED           67350046 04-12-2009       17:50:00        
#> 4 SCH      MED           67350046 04-12-2009       18:00:00        
#> 5 SCH      MED           67350046 04-12-2009       18:10:00        
#> 6 SCH      MED           67350046 04-12-2009       18:20:00        
#> # ... with 7 more variables: PATIENT_ID <dbl>, APPOINTMENT_TYPE <chr>,
#> #   BOOKING_DATE <chr>, BOOKING_TIME <chr>, SHOW_CODE <chr>,
#> #   CHECKIN_DATE <chr>, CHECKIN_TIME <chr>

Created on 2019-01-27 by the reprex package (v0.2.1)

Here is the reprex for the dataset

1 Like

Here is one way to do what I think you are after:

library(magrittr)

samp <- tibble::tribble(
  ~FACILITY, ~DEPARTMENT, ~PROVIDER_ID, ~APPOINTMENT_DATE, ~APPOINTMENT_TIME, ~PATIENT_ID, ~APPOINTMENT_TYPE, ~BOOKING_DATE, ~BOOKING_TIME, ~SHOW_CODE, ~CHECKIN_DATE, ~CHECKIN_TIME,
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:30:00",    12210741, "Telephone Visit",  "04-12-2009",    "15:05:00",        "Y",  "04-12-2009",    "17:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:40:00",     5491175, "Telephone Visit",  "04-12-2009",    "16:15:00",        "Y",  "04-12-2009",    "17:40:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:50:00",    15740599, "Telephone Visit",  "04-12-2009",    "16:43:00",        "Y",  "04-12-2009",    "17:50:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:00:00",    80420422, "Telephone Visit",  "04-12-2009",    "16:53:00",        "Y",  "04-12-2009",    "18:00:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:10:00",    86030202, "Telephone Visit",  "04-12-2009",    "16:57:00",        "Y",  "04-12-2009",    "18:10:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:20:00",    68651098, "Telephone Visit",  "04-12-2009",    "17:14:00",        "Y",  "04-12-2009",    "18:20:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:30:00",    91221095, "Telephone Visit",  "04-12-2009",    "17:01:00",        "Y",  "04-12-2009",    "18:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:50:00",    45771319, "Telephone Visit",  "04-12-2009",    "17:16:00",        "Y",  "04-12-2009",    "18:50:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:00:00",    63531063, "Telephone Visit",  "04-12-2009",    "17:25:00",        "Y",  "04-12-2009",    "19:00:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:10:00",    96011310, "Telephone Visit",  "04-12-2009",    "17:47:00",        "Y",  "04-12-2009",    "19:10:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:20:00",     1071301, "Telephone Visit",  "04-12-2009",    "17:54:00",        "Y",  "04-12-2009",    "19:20:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:30:00",    28841329, "Telephone Visit",  "04-12-2009",    "18:21:00",        "Y",  "04-12-2009",    "19:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:40:00",    95001285, "Telephone Visit",  "04-12-2009",    "18:40:00",        "Y",  "04-12-2009",    "19:40:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "17:30:00",    43170522, "Telephone Visit",  "08-12-2009",    "15:12:00",        "Y",  "08-12-2009",    "17:30:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "17:40:00",    85241162, "Telephone Visit",  "08-12-2009",    "17:38:00",        "Y",  "08-12-2009",    "17:47:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:00:00",     3340506, "Telephone Visit",  "08-12-2009",    "17:58:00",        "Y",  "08-12-2009",    "18:00:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:20:00",    76670349, "Telephone Visit",  "08-12-2009",    "17:49:00",        "Y",  "08-12-2009",    "18:20:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:30:00",    62731301, "Telephone Visit",  "08-12-2009",    "17:56:00",        "Y",  "08-12-2009",    "18:30:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "19:00:00",     3731365, "Telephone Visit",  "08-12-2009",    "18:07:00",        "Y",  "08-12-2009",    "19:00:00"
)


samp %>%
  dplyr::group_by(PATIENT_ID, APPOINTMENT_DATE) %>%
  dplyr::summarise(n = n())
#> # A tibble: 19 x 3
#> # Groups:   PATIENT_ID [?]
#>    PATIENT_ID APPOINTMENT_DATE     n
#>         <dbl> <chr>            <int>
#>  1    1071301 04-12-2009           1
#>  2    3340506 08-12-2009           1
#>  3    3731365 08-12-2009           1
#>  4    5491175 04-12-2009           1
#>  5   12210741 04-12-2009           1
#>  6   15740599 04-12-2009           1
#>  7   28841329 04-12-2009           1
#>  8   43170522 08-12-2009           1
#>  9   45771319 04-12-2009           1
#> 10   62731301 08-12-2009           1
#> 11   63531063 04-12-2009           1
#> 12   68651098 04-12-2009           1
#> 13   76670349 08-12-2009           1
#> 14   80420422 04-12-2009           1
#> 15   85241162 08-12-2009           1
#> 16   86030202 04-12-2009           1
#> 17   91221095 04-12-2009           1
#> 18   95001285 04-12-2009           1
#> 19   96011310 04-12-2009           1

Created on 2019-01-27 by the reprex package (v0.2.1)
This shows you how many times each patient visited on a given day.

I want to calculate the number of patients that have no follow up appointments within the next seven days.

I did group_by on patient_id but not able to get if they have appointments within the next seven days.

library(dplyr)
library(lubridate)

samp %>% 
    mutate(BOOKING_DATE = dmy(BOOKING_DATE)) %>% 
    arrange(PATIENT_ID, BOOKING_DATE) %>%
    group_by(PATIENT_ID) %>% 
    mutate(previous_date = lag(BOOKING_DATE), days_count = BOOKING_DATE - previous_date) %>% 
    filter(days_count <= 7)
1 Like

Building off andresrcs's example, I tweaked the samp so there are patients with upcoming appointments to drive home how this example works.

library(dplyr)
library(lubridate)


samp <- tibble::tribble(
  ~FACILITY, ~DEPARTMENT, ~PROVIDER_ID, ~APPOINTMENT_DATE, ~APPOINTMENT_TIME, ~PATIENT_ID, ~APPOINTMENT_TYPE, ~BOOKING_DATE, ~BOOKING_TIME, ~SHOW_CODE, ~CHECKIN_DATE, ~CHECKIN_TIME,
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:30:00",    12210741, "Telephone Visit",  "04-12-2009",    "15:05:00",        "Y",  "04-12-2009",    "17:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:40:00",     5491175, "Telephone Visit",  "04-12-2009",    "16:15:00",        "Y",  "04-12-2009",    "17:40:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "17:50:00",    15740599, "Telephone Visit",  "04-12-2009",    "16:43:00",        "Y",  "04-12-2009",    "17:50:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:00:00",    80420422, "Telephone Visit",  "04-12-2009",    "16:53:00",        "Y",  "04-12-2009",    "18:00:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:10:00",    86030202, "Telephone Visit",  "04-12-2009",    "16:57:00",        "Y",  "04-12-2009",    "18:10:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:20:00",    68651098, "Telephone Visit",  "04-12-2009",    "17:14:00",        "Y",  "04-12-2009",    "18:20:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:30:00",    91221095, "Telephone Visit",  "04-12-2009",    "17:01:00",        "Y",  "04-12-2009",    "18:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "18:50:00",    45771319, "Telephone Visit",  "04-12-2009",    "17:16:00",        "Y",  "04-12-2009",    "18:50:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:00:00",    63531063, "Telephone Visit",  "04-12-2009",    "17:25:00",        "Y",  "04-12-2009",    "19:00:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:10:00",    96011310, "Telephone Visit",  "04-12-2009",    "17:47:00",        "Y",  "04-12-2009",    "19:10:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:20:00",     1071301, "Telephone Visit",  "04-12-2009",    "17:54:00",        "Y",  "04-12-2009",    "19:20:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:30:00",    28841329, "Telephone Visit",  "04-12-2009",    "18:21:00",        "Y",  "04-12-2009",    "19:30:00",
  "SCH",       "MED",     67350046,      "04-12-2009",        "19:40:00",    95001285, "Telephone Visit",  "04-12-2009",    "18:40:00",        "Y",  "04-12-2009",    "19:40:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "17:30:00",    12210741, "Telephone Visit",  "08-12-2009",    "15:12:00",        "Y",  "08-12-2009",    "17:30:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "17:40:00",    5491175, "Telephone Visit",  "08-12-2009",    "17:38:00",        "Y",  "08-12-2009",    "17:47:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:00:00",     3340506, "Telephone Visit",  "08-12-2009",    "17:58:00",        "Y",  "08-12-2009",    "18:00:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:20:00",    76670349, "Telephone Visit",  "08-12-2009",    "17:49:00",        "Y",  "08-12-2009",    "18:20:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "18:30:00",    62731301, "Telephone Visit",  "08-12-2009",    "17:56:00",        "Y",  "08-12-2009",    "18:30:00",
  "SCH",       "MED",     67350046,      "08-12-2009",        "19:00:00",     3731365, "Telephone Visit",  "08-12-2009",    "18:07:00",        "Y",  "08-12-2009",    "19:00:00"
)

samp %>% 
  mutate(BOOKING_DATE = dmy(BOOKING_DATE)) %>% 
  arrange(PATIENT_ID, BOOKING_DATE) %>%
  group_by(PATIENT_ID) %>% 
  mutate(previous_date = lag(BOOKING_DATE), days_count = BOOKING_DATE - previous_date) %>% 
  select(PATIENT_ID, days_count, everything()) %>% 
  filter(days_count <= 7)
#> # A tibble: 2 x 14
#> # Groups:   PATIENT_ID [2]
#>   PATIENT_ID days_count FACILITY DEPARTMENT PROVIDER_ID APPOINTMENT_DATE
#>        <dbl> <time>     <chr>    <chr>            <dbl> <chr>           
#> 1    5491175 4          SCH      MED           67350046 08-12-2009      
#> 2   12210741 4          SCH      MED           67350046 08-12-2009      
#> # ... with 8 more variables: APPOINTMENT_TIME <chr>,
#> #   APPOINTMENT_TYPE <chr>, BOOKING_DATE <date>, BOOKING_TIME <chr>,
#> #   SHOW_CODE <chr>, CHECKIN_DATE <chr>, CHECKIN_TIME <chr>,
#> #   previous_date <date>

Created on 2019-01-28 by the reprex package (v0.2.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.