Help with organizing data

Hello everyone,

I have the following dataset with a 100 patients that have received 4 different kinds of treatment:

set.seed(500)
Data1 <- data.frame(
    TXT = sample(1:4,100, replace = TRUE),
    YEAR_OF_DIAGNOSIS = sample(2004:2015,100, replace = TRUE),
FacilityID = sample(letters[1:5], 100, replace = TRUE)
)

Created on 2019-06-17 by the reprex package (v0.3.0)

How can I add a column to the dataframe that reports if a particular facility has treated less than 5 vs 5 or more patients (I'm basically trying to find if a certain patient was treated at a facility with a lot of experience with that particular treatment or not)

From your question, I'm not sure if year of diagnosis is relevant or not to counting the number of treatments. In case it does not matter, here is an approach that groups by TXT and FacilityID and then counts the number of observations in each group. If that count is greater than or equal to 5, over_5 will be TRUE, otherwise FALSE. Just a note, I created an n_procedures variable for extra clarity, but you could just use row_number() >= 5 in the if_else() if you want to skip that step.

library(tidyverse)

set.seed(500)
Data1 <- data.frame(
  TXT = sample(1:4,100, replace = TRUE),
  YEAR_OF_DIAGNOSIS = sample(2004:2015,100, replace = TRUE),
  FacilityID = sample(letters[1:5], 100, replace = TRUE)
)

Data1 %>% 
  group_by(TXT, FacilityID) %>% 
  mutate(
    n_procedures = n(),
    over_5 = if_else(n_procedures >= 5, TRUE, FALSE)
    )
#> # A tibble: 100 x 5
#> # Groups:   TXT, FacilityID [20]
#>      TXT YEAR_OF_DIAGNOSIS FacilityID n_procedures over_5
#>    <int>             <int> <fct>             <int> <lgl> 
#>  1     3              2004 d                     4 FALSE 
#>  2     3              2012 e                     5 TRUE  
#>  3     3              2010 e                     5 TRUE  
#>  4     1              2013 c                     5 TRUE  
#>  5     2              2012 b                     5 TRUE  
#>  6     1              2013 c                     5 TRUE  
#>  7     3              2014 a                     5 TRUE  
#>  8     4              2014 e                     6 TRUE  
#>  9     3              2005 c                     3 FALSE 
#> 10     2              2010 b                     5 TRUE  
#> # … with 90 more rows

Created on 2019-06-17 by the reprex package (v0.3.0)

2 Likes

Thank you so much! Your Solution works

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.