Hello,
I would like to make a new column in one dataframe, that returns a value from another dataframe if a number of conditions are met.
Both dataframes have a grouping variable 'subject'. Each subject uses several medications. The medications could have been prescribed on a planned study-related visit to the doctor or on another moment (unrelated to the study).
I would like the new column, called 'period', to return the planned visit if the medication was started on this date (including 2 days before up to 2 days after the visit). If not, I would like 'period' to return one of 3 possible strings 'before', 'after' or 'in-between'.
I managed to get the right value in 1 patient for 1 medication, but unfortunately did not get further:
# prescribed medication and starting dates in 3 subjects
a <-data.frame(stringsAsFactors=FALSE,
subject = c(1, 1, 1, 1, 2, 2, 3, 3, 3),
medication = c('med1', 'med2', 'med3', 'med4', 'med5', 'med6', 'med7', 'med8', 'med9'),
startdate = c('01-01-2016', '02-02-2016', '10-02-2016', '02-03-2016', '07-03-2015', '10-12-2015', '06-01-2018', '08-03-2018', '11-04-2018'))
a$startdate <- as.Date(a$startdate, format ="%d-%m-%Y")
# doctor visit number and corresponding dates in the same 3 subjects
b <-data.frame(stringsAsFactors=FALSE,
subject = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
visitnumber = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
visitdate = c('02-02-2016', '03-03-2016', '05-04-2016', '04-02-2015', '06-03-2015', '07-04-2015', '06-02-2018', '08-03-2018', '10-04-2018'))
b$visitdate <- as.Date(b$visitdate, format ="%d-%m-%Y")
# I first tried a subset of only subject 1 and 'med2'
c <- subset(a, subject == 1)
d <- subset(b, subject ==1)
d
#> subject visitnumber visitdate
#> 1 1 1 2016-02-02
#> 2 1 2 2016-03-03
#> 3 1 3 2016-04-05
e <- subset(c, medication == 'med2')
e
#> subject medication startdate
#> 2 1 med2 2016-02-02
#function, e$startdate for 'sd', d$visitnumber for 'vn' and d$visitdate for'vd'
overlap <- function(sd, vd, vn){
if (sd >= vd -2 & sd <= vd +2 ) {
return(vn)}
else if (sd < min(vd)) {
return("before")}
else if (sd > max(vd)) {
return("after")}
else {
return("in-between")}
}
test<- overlap(e$startdate, d$visitdate, d$visitnumber)
#> Warning in if (sd >= vd - 2 & sd <= vd + 2) {: the condition has length > 1 and
#> only the first element will be used
test
#> [1] 1 2 3
#I read that error can be removed by vectorizing:
f_vec <- Vectorize(overlap, vectorize.args = c("sd", "vd", "vn"))
test1 <- f_vec(e$startdate, d$visitdate, d$visitnumber)
test1
#> [1] "1" "before" "before"
Created on 2020-05-14 by the reprex package (v0.3.0)
I only need the first output of test1 ("1" here).
Then the function needs to iterate over rows, or maybe grouped by subjects, or another conditional statement needs to be added, to have "period" in dataframe 'a' on all subjects for all prescribed medications.
Any help how to proceed is greatly appreciated!