A simple problem needs help【number of items to replace is not a multiple of replacement length】

Hello community,
I'm a new user of R. Now I meet a problem and want to get help here.
I want to use the function“difftime” to calculate the age of everybody in my dataframe. My sentence is:
dat$age<-round(as.numeric((difftime("2017-12-31", dat$BIRTHDAY, units = "days"))/365.25),digits = 2)
But the result show there are so many NA in "age", and the system point out that “number of items to replace is not a multiple of replacement length”.
Here is my dataframe:

``` r
data.frame(
id = c(1,2,3,4,5),
birthday = c("1968-10-15","2007-11-16","1988-11-15","2008-11-16","1995-10-20"),
deathstatus = c(1,1,1,0,0),
deathdate = c("2009-11-16", "2008-11-16", "2007-11-16",NA,NA)
)
#>   id   birthday deathstatus  deathdate
#> 1  1 1968-10-15           1 2009-11-16
#> 2  2 2007-11-16           1 2008-11-16
#> 3  3 1988-11-15           1 2007-11-16
#> 4  4 2008-11-16           0       <NA>
#> 5  5 1995-10-20           0       <NA>

Created on 2019-03-04 by the reprex package (v0.2.1)

Any suggestion is welcome. Thank you.

Welcome to this community.

To get more helpful answers, please ask questions with a reproducible example. If you don't know how, here's a helpful link:

Now, I generated some birthdays and calculated ages at 2019-02-28. It seems to work correctly without any error. See here:

set.seed(seed = 24971)

simulated_birthdays <- sample(x = seq(from = as.Date(x = '1991/01/01'),
                                      to = as.Date(x = '2000/12/31'),
                                      by = 11))

head(x = simulated_birthdays)
#> [1] "1992-06-12" "1991-03-19" "1992-04-29" "1999-04-03" "1991-06-04"
#> [6] "1997-03-05"

current_date <- as.Date(x = '2019/02/28')

ages_in_days <- difftime(time1 = current_date,
                         time2 = simulated_birthdays,
                         units = 'days')

head(x = ages_in_days)
#> Time differences in days
#> [1]  9757 10208  9801  7271 10131  8030

approximate_ages_in_years <- round(x = as.numeric(x = (ages_in_days / 365.25)),
                                   digits = 2)

head(x = approximate_ages_in_years)
#> [1] 26.71 27.95 26.83 19.91 27.74 21.98

Created on 2019-02-28 by the reprex package (v0.2.1)

Can you please check whether you are in a better luck with dat$age<-round(as.numeric((difftime(as.Date("2017-12-31"), dat$BIRTHDAY, units = "days"))/365.25),digits = 2)?

1 Like

Perhaps some of your BIRTHDAY values are incorrectly formatted?

Thank you for your patience.
I also considered may be there are some format errors in my value. My data was imported from CSV, so I tried to paste all my data to a "TXT" then paste it back to my CSV. Then I transformed the data into DATE format with sentence "as.Date()" in R.
I knew it may be a stupid way, but it still didn't work.

Thank you for your patience.
In fact, my dataframe is :
name birthday deathstatus deathdate
1 1988-11-15 1 2009-11-16
2 1968-10-15 1 2007-11-16
3 1988-11-15 1 2008-11-16
4 1990-12-15 0 NA
5 1988-11-15 0 NA
And I want to calculate people's age wheather they're alive or not. My sentence is:
dat$age[dat$deathstatus == 0]<-round(as.numeric((difftime("2017-12-31", dat$birthday,units = "days"))/365.25),digits = 2)
dat$age[dat$deathstatus == 1]<-round(as.numeric((difftime(dat$deathdate, dat$BIRTHDAY,units = "days"))/365.25),digits = 2)
Finally, there are many NA in age column. I have already check my sentence,it seems work and I didn't find any similarity among rows with NA.
I really hope you can give me some advice.
Thank you again.

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

As Andres has mentioned, providing a reprex is very helpful. Please keep that in mind for your future posts. It helps others to help you.

There's a problem in your code. You don't have a column named BIRTHDAY in your dataset, but you used it. Most probably, it was a typo, and hence I corrected it below in my example.

Now, let's see what's happening:

# creating the dataset
dat <- data.frame(stringsAsFactors = FALSE,
                  name = c(1, 2, 3, 4, 5),
                  birthday = c("1988-11-15", "1968-10-15", "1988-11-15", "1990-12-15", "1988-11-15"),
                  deathstatus = c(1, 1, 1, 0, 0),
                  deathdate = c("2009-11-16", "2007-11-16", "2008-11-16", NA, NA))

# what you did and why there are problems

## creating a copy of the data
dat_1 <- dat

## the following creates a logical vector of length 5
## but only two of them are TRUE, and those are last positions
dat_1$deathstatus == 0
#> [1] FALSE FALSE FALSE  TRUE  TRUE

## while this creates a numeric vector of length 5
## calculating age at 2017/12/31 for all people, which you don't need actually
round(as.numeric((difftime("2017-12-31",
                           dat_1$birthday,
                           units = "days")) / 365.25),
      digits = 2)
#> [1] 29.13 49.21 29.13 27.04 29.13

## so this line tries to assign 5 numbers to 2 places
## obviously, it fails and hence shows the warning
## not only that, it assigns only the first 2 values of the numeric vector
## to the positions where the logical vector is TRUE
dat_1$age[dat_1$deathstatus == 0] <- round(as.numeric((difftime("2017-12-31",
                                                                dat_1$birthday,
                                                                units = "days")) / 365.25),
                                           digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 0] <-
#> round(as.numeric((difftime("2017-12-31", : number of items to replace is
#> not a multiple of replacement length

## verify it below
dat_1
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16    NA
#> 2    2 1968-10-15           1 2007-11-16    NA
#> 3    3 1988-11-15           1 2008-11-16    NA
#> 4    4 1990-12-15           0       <NA> 29.13
#> 5    5 1988-11-15           0       <NA> 49.21

## same problem will persist in next case
## but it'll be hard to notice

## the 1st 3 positions of the logical vector (of length 5) is TRUE here
dat_1$deathstatus == 1
#> [1]  TRUE  TRUE  TRUE FALSE FALSE

## generates a numeric vector of length 5
## computes age till death of all people, so returns NA for alive people
round(as.numeric((difftime(dat_1$deathdate,
                           dat_1$birthday,
                           units = "days")) / 365.25),
      digits = 2)
#> [1] 21.00 39.09 20.00    NA    NA

## you're placing 5 items in 3 holders, and hence get warned
## but it'll seem to be OK
## that's because as the numbers of interest are positioned at the first
## he problem is not quite visible here, but it exists nevertheless
dat_1$age[dat_1$deathstatus == 1] <- round(as.numeric((difftime(dat_1$deathdate,
                                                                dat_1$birthday,
                                                                units = "days")) / 365.25),
                                           digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 1] <-
#> round(as.numeric((difftime(dat_1$deathdate, : number of items to replace is
#> not a multiple of replacement length

## check below
dat_1
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16 21.00
#> 2    2 1968-10-15           1 2007-11-16 39.09
#> 3    3 1988-11-15           1 2008-11-16 20.00
#> 4    4 1990-12-15           0       <NA> 29.13
#> 5    5 1988-11-15           0       <NA> 49.21

# what I did

## creating another copy
dat_2 <- dat

## doing the same you did, but a slightly different way
(dat_2 <- within(data = dat_2,
                 expr = {
                   birthday <- as.Date(x = birthday)
                   deathdate <- as.Date(x = deathdate)
                   age <- sapply(X = name,
                                 FUN = function(index)
                                 {
                                   if(deathstatus[index] == 0)
                                   {
                                     in_days <- difftime(time1 = as.Date(x = "2017-12-31"),
                                                         time2 = birthday[index],
                                                         units = 'days')
                                   } else
                                   {
                                     in_days <- difftime(time1 = deathdate[index],
                                                         time2 = birthday[index],
                                                         units = 'days')
                                   }
                                   in_approx_years <- round(x = as.numeric(x = (in_days / 365.25)),
                                                            digits = 2)
                                   return(in_approx_years)
                                 })
                 }))
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16 21.00
#> 2    2 1968-10-15           1 2007-11-16 39.09
#> 3    3 1988-11-15           1 2008-11-16 20.00
#> 4    4 1990-12-15           0       <NA> 27.04
#> 5    5 1988-11-15           0       <NA> 29.13

Created on 2019-03-01 by the reprex package (v0.2.1)

Hope this helps.

PS: If you don't mind, let me point out that the reprex you have posted after modifying your question, it's incorrect. You haven't surrounded the dates by quotes, and so "1968-10-15" became 1943, etc.

2 Likes

Thank you!
I' ll correct it and try to turn this into reprex.

Thank you so much for your help!
Finally solved a problem that bothered me for several days.
Thank you again!

I made a stupid mistake.:喜悦:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.