Create an age variable to account for missing data

Hello,
I am working as an epidemiologist on the COVID-19 vaccination. I have an excel file in which it is written the date of birth of each individual as well as the date he received his first, second or third dose. I want to create a variable that will indicate the age of the individual at the time they received each of their vaccine doses.

So I created the following code:
moderna$age_admin2 <- age_calc(moderna$dob, moderna$datedevaccination2, units = "years", precise = TRUE)

But the following error message appears :
Error in if (any(enddate < dob)) { :
missing value where TRUE/FALSE needed

I understand that it is because I have to take into account missing data, but I don't understand how.

Can anyone help me? :slight_smile:

Marylie

Where does the function age_calc() come from?

I calculate the age of the individual at the time of receiving a dose using the date of birth and the date the individual received the given dose.

When I use this function to calculate the age at the time of receiving the first dose it works because I have no missing data (all individuals in the file have at least received their first dose of vaccine). So this function works when I have no missing data.

But when I want to calculate the age at the time of the second dose, there are some individuals who have not yet received it. So I think that I have to add a part of code that takes into account the missing data for the date of reception of the second dose.

Did you create the function? If so, perhaps post the code. If not, what package does the function come from?

This function come from the Package ‘eeptools’ who was built under R version 4.0.5

Loading required package: ggplot2

Try

moderna$age_admin2 <- ifelse(isna(moderna$dob) | is.na(moderna$datedevaccination2),  NA,  age_calc(moderna$dob, moderna$datedevaccination2, units = "years", precise = TRUE))

Thank you so much, but it unfortunately doesn't work.
I don't understand, it shouldn't be that hard, but nothing I try works :confounded:

You might want to post a reprex so folks can see exactly what's happening.

The ID is the identifier of my individual
I have : date of birth
date of first dose (datedevaccination1)
date of second dose (datedevaccination2)

ID dob datedevaccination1 datedevaccination2 age_admin1
1 7-27-2001 6-30-2021 10-13-2021 19,9260274
2 10-5-1971 6-18-2021 #N/A 49,70136986
3 4-29-1977 2-24-2021 6-15-2021 43,82465753
4 8-21-2001 9-2-2021 #N/A 20,03287671

My database (modern) comes from the medico-administrative extraction of vaccinated persons.
Each person on this list has at least a first dose of vaccine.
I was able to obtain the age of the person at the time he received his first dose with the following code :
moderna$age_admin1 <- age_calc(moderna$dob, moderna$datedevaccination1, units = "years", precise = TRUE)

This code works because there is no missing data in the variables used to obtain the age at first dose. This same calculation does not work for the second dose since there is missing data (some have not yet received their second dose)

I'm not sure what kind of variable is datedevaccination2? Usually in R a missing value appears as NA, not #N/A.

I haven't hand-checked for leap year artifacts, but this should be usable.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# BEGIN create synthetic data
first <- ymd("2020-12-20") # date on which vaccine first became available
second <- Sys.Date() # today
old <- ymd("1930-06-13")
mid <- second - years(37)
yng <- second - years(25)
kid <- second - years(18)
olds <- seq(from = old, to = mid, by = "day")
mids <- seq(from = mid, to = yng, by = "day")
kids <- seq(from = yng, to = kid, by = "day")
set.seed(42)
elders <- sample(olds,45)
set.seed(42)
adults <- sample(mids,40)
set.seed(42)
youth <- sample(kids,15)
dob <- c(elders,adults,youth)
subj <- 1:100
set.seed(42)
indx <- sample(subj,10)
win1 <- seq(from=first,to=second,by="day") # days on which first dose possibly given
set.seed(42)
dose1 <- sample(win1,100)
dose2 <- dose1 + months(1)
dose3 <- dose2 + months(6)
age1 <- rep(NA,100)
age2 <- rep(NA,100)
age3 <- rep(NA,100)


DF <- data.frame(subj = subj,
                 dob = dob,
                 dose1 = dose1,
                 dose2 = dose2,
                 dose3 = dose3,
                 age1 = age1,
                 age2 = age2,
                 age3 = age3)

head(DF)
#>   subj        dob      dose1      dose2      dose3 age1 age2 age3
#> 1    1 1981-10-15 2021-02-06 2021-03-06 2021-09-06   NA   NA   NA
#> 2    2 1955-11-18 2021-11-05 2021-12-05 2022-06-05   NA   NA   NA
#> 3    3 1933-11-15 2021-05-21 2021-06-21 2021-12-21   NA   NA   NA
#> 4    4 1972-11-24 2021-03-03 2021-04-03 2021-10-03   NA   NA   NA
#> 5    5 1954-08-11 2021-08-04 2021-09-04 2022-03-04   NA   NA   NA
#> 6    6 1958-08-13 2021-05-14 2021-06-14 2021-12-14   NA   NA   NA

# create some missing dob
DF[indx,"dob"] <- NA
# show row with NA dob
head(DF,12)
#>    subj        dob      dose1      dose2      dose3 age1 age2 age3
#> 1     1 1981-10-15 2021-02-06 2021-03-06 2021-09-06   NA   NA   NA
#> 2     2 1955-11-18 2021-11-05 2021-12-05 2022-06-05   NA   NA   NA
#> 3     3 1933-11-15 2021-05-21 2021-06-21 2021-12-21   NA   NA   NA
#> 4     4 1972-11-24 2021-03-03 2021-04-03 2021-10-03   NA   NA   NA
#> 5     5 1954-08-11 2021-08-04 2021-09-04 2022-03-04   NA   NA   NA
#> 6     6 1958-08-13 2021-05-14 2021-06-14 2021-12-14   NA   NA   NA
#> 7     7 1967-03-30 2021-04-20 2021-05-20 2021-11-20   NA   NA   NA
#> 8     8 1969-10-05 2021-12-12 2022-01-12 2022-07-12   NA   NA   NA
#> 9     9 1976-04-11 2021-04-26 2021-05-26 2021-11-26   NA   NA   NA
#> 10   10 1951-07-12 2021-10-18 2021-11-18 2022-05-18   NA   NA   NA
#> 11   11 1941-04-09 2021-01-12 2021-02-12 2021-08-12   NA   NA   NA
#> 12   12 1955-05-03 2021-11-11 2021-12-11 2022-06-11   NA   NA   NA

# END create syntetic data

# arbitrary assumptions for age range at dose
imputed <- 65

# identify which records have dob and which are missing
good_dob <- !is.na(DF$dob)
bad_dob <- is.na(DF$dob)

# calculate ages at each dose date for records with dob
DF[good_dob,"age1"] <- year(DF[good_dob,"dose1"]) - year(DF[good_dob,"dob"])
DF[good_dob,"age2"] <- year(DF[good_dob,"dose2"]) - year(DF[good_dob,"dob"])
DF[good_dob,"age3"] <- year(DF[good_dob,"dose3"]) - year(DF[good_dob,"dob"])

# assign ages at each dose date for records withot dob
DF[bad_dob,"age1"] <- imputed
DF[bad_dob,"age2"] <- imputed
DF[bad_dob,"age3"] <- imputed
DF
#>     subj        dob      dose1      dose2      dose3 age1 age2 age3
#> 1      1 1981-10-15 2021-02-06 2021-03-06 2021-09-06   40   40   40
#> 2      2 1955-11-18 2021-11-05 2021-12-05 2022-06-05   66   66   67
#> 3      3 1933-11-15 2021-05-21 2021-06-21 2021-12-21   88   88   88
#> 4      4 1972-11-24 2021-03-03 2021-04-03 2021-10-03   49   49   49
#> 5      5 1954-08-11 2021-08-04 2021-09-04 2022-03-04   67   67   68
#> 6      6 1958-08-13 2021-05-14 2021-06-14 2021-12-14   63   63   63
#> 7      7 1967-03-30 2021-04-20 2021-05-20 2021-11-20   54   54   54
#> 8      8 1969-10-05 2021-12-12 2022-01-12 2022-07-12   52   53   53
#> 9      9 1976-04-11 2021-04-26 2021-05-26 2021-11-26   45   45   45
#> 10    10 1951-07-12 2021-10-18 2021-11-18 2022-05-18   70   70   71
#> 11    11 1941-04-09 2021-01-12 2021-02-12 2021-08-12   80   80   80
#> 12    12 1955-05-03 2021-11-11 2021-12-11 2022-06-11   66   66   67
#> 13    13 1945-03-28 2021-03-18 2021-04-18 2021-10-18   76   76   76
#> 14    14 1932-12-30 2021-06-02 2021-07-02 2022-01-02   89   89   90
#> 15    15 1973-01-29 2021-04-08 2021-05-08 2021-11-08   48   48   48
#> 16    16 1963-09-03 2021-01-08 2021-02-08 2021-08-08   58   58   58
#> 17    17 1931-02-26 2021-10-12 2021-11-12 2022-05-12   90   90   91
#> 18    18       <NA> 2021-11-30 2021-12-30 2022-06-30   65   65   65
#> 19    19 1967-09-16 2021-09-28 2021-10-28 2022-04-28   54   54   55
#> 20    20 1950-07-03 2021-04-07 2021-05-07 2021-11-07   71   71   71
#> 21    21 1937-03-01 2020-12-24 2021-01-24 2021-07-24   83   84   84
#> 22    22 1955-07-25 2021-07-19 2021-08-19 2022-02-19   66   66   67
#> 23    23 1951-10-09 2021-09-04 2021-10-04 2022-04-04   70   70   71
#> 24    24       <NA> 2021-10-29 2021-11-29 2022-05-29   65   65   65
#> 25    25       <NA> 2021-10-13 2021-11-13 2022-05-13   65   65   65
#> 26    26 1975-02-02 2021-12-02 2022-01-02 2022-07-02   46   47   47
#> 27    27 1962-04-04 2021-05-26 2021-06-26 2021-12-26   59   59   59
#> 28    28 1961-03-05 2021-10-14 2021-11-14 2022-05-14   60   60   61
#> 29    29 1977-04-17 2021-11-19 2021-12-19 2022-06-19   44   44   45
#> 30    30 1933-01-12 2021-05-04 2021-06-04 2021-12-04   88   88   88
#> 31    31 1961-12-15 2021-10-07 2021-11-07 2022-05-07   60   60   61
#> 32    32 1977-01-06 2021-11-08 2021-12-08 2022-06-08   44   44   45
#> 33    33 1942-05-18 2021-12-07 2022-01-07 2022-07-07   79   80   80
#> 34    34 1964-06-11 2021-11-23 2021-12-23 2022-06-23   57   57   58
#> 35    35 1979-02-15 2021-07-04 2021-08-04 2022-02-04   42   42   43
#> 36    36 1965-02-10 2020-12-23 2021-01-23 2021-07-23   55   56   56
#> 37    37 1973-08-10 2021-08-02 2021-09-02 2022-03-02   48   48   49
#> 38    38 1938-02-28 2021-07-22 2021-08-22 2022-02-22   83   83   84
#> 39    39 1955-08-27 2021-08-21 2021-09-21 2022-03-21   66   66   67
#> 40    40 1969-04-10 2021-04-12 2021-05-12 2021-11-12   52   52   52
#> 41    41 1965-11-13 2021-09-07 2021-10-07 2022-04-07   56   56   57
#> 42    42 1969-10-21 2021-04-28 2021-05-28 2021-11-28   52   52   52
#> 43    43 1931-11-11 2020-12-22 2021-01-22 2021-07-22   89   90   90
#> 44    44 1984-11-13 2021-09-03 2021-10-03 2022-04-03   37   37   38
#> 45    45 1930-09-23 2021-06-23 2021-07-23 2022-01-23   91   91   92
#> 46    46 1992-02-02 2021-05-06 2021-06-06 2021-12-06   29   29   29
#> 47    47       <NA> 2021-01-28 2021-02-28 2021-08-28   65   65   65
#> 48    48 1991-06-07 2021-11-22 2021-12-22 2022-06-22   30   30   31
#> 49    49       <NA> 2021-01-21 2021-02-21 2021-08-21   65   65   65
#> 50    50 1988-05-16 2021-04-01 2021-05-01 2021-11-01   33   33   33
#> 51    51 1986-09-06 2021-12-08 2022-01-08 2022-07-08   35   36   36
#> 52    52 1990-09-08 2021-11-09 2021-12-09 2022-06-09   31   31   32
#> 53    53 1995-08-27 2021-05-25 2021-06-25 2021-12-25   26   26   26
#> 54    54 1985-12-02 2021-03-05 2021-04-05 2021-10-05   36   36   36
#> 55    55 1996-08-11 2021-09-10 2021-10-10 2022-04-10   25   25   26
#> 56    56 1995-10-09 2021-01-23 2021-02-23 2021-08-23   26   26   26
#> 57    57 1990-02-26 2021-07-28 2021-08-28 2022-02-28   31   31   32
#> 58    58 1987-05-29 2021-01-04 2021-02-04 2021-08-04   34   34   34
#> 59    59 1987-07-01 2021-07-27 2021-08-27 2022-02-27   34   34   35
#> 60    60 1987-09-04 2021-08-24 2021-09-24 2022-03-24   34   34   35
#> 61    61 1990-06-18 2021-04-16 2021-05-16 2021-11-16   31   31   31
#> 62    62 1995-11-26 2021-11-01 2021-12-01 2022-06-01   26   26   27
#> 63    63 1987-04-20 2021-03-11 2021-04-11 2021-10-11   34   34   34
#> 64    64 1995-09-29 2021-11-28 2021-12-28 2022-06-28   26   26   27
#> 65    65       <NA> 2021-05-17 2021-06-17 2021-12-17   65   65   65
#> 66    66 1986-04-06 2021-02-14 2021-03-14 2021-09-14   35   35   35
#> 67    67 1990-08-14 2021-03-29 2021-04-29 2021-10-29   31   31   31
#> 68    68 1985-10-06 2021-03-20 2021-04-20 2021-10-20   36   36   36
#> 69    69 1995-11-07 2021-09-14 2021-10-14 2022-04-14   26   26   27
#> 70    70 1991-08-31 2021-06-18 2021-07-18 2022-01-18   30   30   31
#> 71    71       <NA> 2021-02-11 2021-03-11 2021-09-11   65   65   65
#> 72    72 1987-03-27 2021-07-15 2021-08-15 2022-02-15   34   34   35
#> 73    73 1987-08-20 2021-08-22 2021-09-22 2022-03-22   34   34   35
#> 74    74       <NA> 2021-02-17 2021-03-17 2021-09-17   65   65   65
#> 75    75 1991-03-04 2021-04-06 2021-05-06 2021-11-06   30   30   30
#> 76    76 1991-12-07 2021-04-24 2021-05-24 2021-11-24   30   30   30
#> 77    77 1994-04-30 2021-04-10 2021-05-10 2021-11-10   27   27   27
#> 78    78 1993-03-31 2021-03-01 2021-04-01 2021-10-01   28   28   28
#> 79    79 1986-12-08 2020-12-20 2021-01-20 2021-07-20   34   35   35
#> 80    80 1987-07-14 2021-05-09 2021-06-09 2021-12-09   34   34   34
#> 81    81 1994-01-10 2021-07-13 2021-08-13 2022-02-13   27   27   28
#> 82    82 1986-08-29 2021-01-30       <NA>       <NA>   35   NA   NA
#> 83    83 1996-07-07 2021-09-22 2021-10-22 2022-04-22   25   25   26
#> 84    84 1988-10-07 2021-11-15 2021-12-15 2022-06-15   33   33   34
#> 85    85 1992-08-29 2021-08-27 2021-09-27 2022-03-27   29   29   30
#> 86    86 2003-06-07 2021-10-21 2021-11-21 2022-05-21   18   18   19
#> 87    87 2000-03-02 2021-01-13 2021-02-13 2021-08-13   21   21   21
#> 88    88 1999-12-14 2021-06-28 2021-07-28 2022-01-28   22   22   23
#> 89    89       <NA> 2021-01-20 2021-02-20 2021-08-20   65   65   65
#> 90    90 1998-09-06 2021-08-14 2021-09-14 2022-03-14   23   23   24
#> 91    91 2002-09-08 2021-01-02 2021-02-02 2021-08-02   19   19   19
#> 92    92 2000-02-06 2021-04-09 2021-05-09 2021-11-09   21   21   21
#> 93    93 2000-07-30 2021-11-02 2021-12-02 2022-06-02   21   21   22
#> 94    94 2002-08-14 2021-10-02 2021-11-02 2022-05-02   19   19   20
#> 95    95 1997-12-02 2021-10-01 2021-11-01 2022-05-01   24   24   25
#> 96    96 2001-05-24 2021-08-18 2021-09-18 2022-03-18   20   20   21
#> 97    97 1997-05-25 2021-07-31 2021-08-31       <NA>   24   24   NA
#> 98    98 2003-09-04 2021-07-21 2021-08-21 2022-02-21   18   18   19
#> 99    99 2002-02-26 2021-03-24 2021-04-24 2021-10-24   19   19   19
#> 100  100       <NA> 2020-12-25 2021-01-25 2021-07-25   65   65   65

This is because I copied and pasted the data from my excel file, but in R it is well coded as_date :slight_smile:

dateevaccination2 is the date when the individuals received their second dose of vaccine. When there is a missing data, it means that the person has not yet had his second dose of vaccine.

The issue isn't the meaning of the variable, it's how the computer is representing it. It would be much better to show us the data from R rather than Excel.

Oh sorry. I'm a beginner with R Studio. Thank you for the information :slight_smile: !

So here is my data copied directly into R. The following data is what I get, in R, after importing my raw data.

#> ID dob datedevacccination1 datedevaccination2
#> 1 1 2001-07-27 2021-06-30 2021-03-06
#> 2 2 1971-05-10 2021-06-18 NA
#> 3 3 1977-04-29 2021-02-24 2021-06-15
#> 4 4 2001-08-21 2021-02-09 NA

After the following code : moderna$age_admin1 <- age_calc(moderna$dob, moderna$datedevaccination1, units = "years", precise = TRUE)

I get this data in R. :
#> ID dob datedevacccination1 datedevaccination2 age_admin1
#> 1 1 2001-07-27 2021-06-30 2021-03-06 19.9260274
#> 2 2 1971-05-10 2021-06-18 NA 49.70136986
#> 3 3 1977-04-29 2021-02-24 2021-06-15 43.82465753
#> 4 4 2001-08-21 2021-02-09 NA 20.03287671

Now I want to do the same thing with datedevaccination2. Since this is only a small sample of my database, I think it is important to point out that there is no missing data for the variable dob. The only missing data are for the variable datedevaccination2

I'm not sur if I'm clear :sweat_smile:

If there are records lacking a second vaccination date, is the age at that non-event really necessary. Why not just assume the second dose followed the first and calculate the then-current age? It would be useful to have a column indicating that the date was assumed.

This is helpful. The code I posted earlier was an attempt to only calculate the age when you had a valid entry. (I think that's consistent with @technocrat's point.) Can you show us what error message you got when you tried my code?

1 Like

@technocrat Indeed, if there is no second dose (dateevaccination2 = NA) it is quite appropriate that it is entered NA, I do not need age for these individuals. So I only calculate the age when you had a valid entry. But it's seems like R does not want to calculate the age of the individuals who received a second dose since some have missing data...

Thank you for your help :slight_smile: !

@startz When I use your code, the same error message appear :
Error in if (any(enddate < dob)) { :
missing value where TRUE/FALSE needed

Indeed, if there is no second dose (dateevaccination2 = NA) it is quite appropriate that it is entered NA, I do not need age for these individuals. But it is as if R does not want to calculate the age of the individuals who received a second dose since some have missing data.

Thanks to both of you,
Marylie

Perhaps there is something more subtle than I see going on. You might want to use dput() to create text you can copy and paste here for your data and then also post the smallest amount of code that shows the error. That way, someone may be able to reproduce the problem.

OK, so we have DOB for all subjects and we want to calculate the age at each vaccination, but dates for some vaccinations are missing, probably indicated by NA. For those vaccinations, we want the age to be NA. For everything else, we want the calculation, done however we choose to do it.

Let's call that calculation f(v) where v is the date of a vaccination. If we divide the dataset into two subsets, one in which v has the value of NA and the other in which v has a date value, f(v) will return an age integer or NA appropriately.

Begin by setting DF$age to all NA with

DF$age <- NA

then find the row numbers of all cases where DF$v is not NA

indx <- which(!is.na(DF$v))

indx a vector of row numbers of the data frame that have completed entries for v. Use it to update DF$age thusly

DF[indx,"age"] <- YOUR FUNCTION OR EXPRESSION TO CALCULATE AGE HERE