Hello community,
I'm a new user of R. Now I meet a problem and want to get help here.
I want to use the function“difftime” to calculate the age of everybody in my dataframe. My sentence is:
dat$age<-round(as.numeric((difftime("2017-12-31", dat$BIRTHDAY, units = "days"))/365.25),digits = 2)
But the result show there are so many NA in "age", and the system point out that “number of items to replace is not a multiple of replacement length”.
Here is my dataframe:
Can you please check whether you are in a better luck with dat$age<-round(as.numeric((difftime(as.Date("2017-12-31"), dat$BIRTHDAY, units = "days"))/365.25),digits = 2)?
Thank you for your patience.
I also considered may be there are some format errors in my value. My data was imported from CSV, so I tried to paste all my data to a "TXT" then paste it back to my CSV. Then I transformed the data into DATE format with sentence "as.Date()" in R.
I knew it may be a stupid way, but it still didn't work.
Thank you for your patience.
In fact, my dataframe is :
name birthday deathstatus deathdate
1 1988-11-15 1 2009-11-16
2 1968-10-15 1 2007-11-16
3 1988-11-15 1 2008-11-16
4 1990-12-15 0 NA
5 1988-11-15 0 NA
And I want to calculate people's age wheather they're alive or not. My sentence is:
dat$age[dat$deathstatus == 0]<-round(as.numeric((difftime("2017-12-31", dat$birthday,units = "days"))/365.25),digits = 2)
dat$age[dat$deathstatus == 1]<-round(as.numeric((difftime(dat$deathdate, dat$BIRTHDAY,units = "days"))/365.25),digits = 2)
Finally, there are many NA in age column. I have already check my sentence,it seems work and I didn't find any similarity among rows with NA.
I really hope you can give me some advice.
Thank you again.
Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.
If you've never heard of a reprex before, you might want to start by reading this FAQ:
As Andres has mentioned, providing a reprex is very helpful. Please keep that in mind for your future posts. It helps others to help you.
There's a problem in your code. You don't have a column named BIRTHDAY in your dataset, but you used it. Most probably, it was a typo, and hence I corrected it below in my example.
Now, let's see what's happening:
# creating the dataset
dat <- data.frame(stringsAsFactors = FALSE,
name = c(1, 2, 3, 4, 5),
birthday = c("1988-11-15", "1968-10-15", "1988-11-15", "1990-12-15", "1988-11-15"),
deathstatus = c(1, 1, 1, 0, 0),
deathdate = c("2009-11-16", "2007-11-16", "2008-11-16", NA, NA))
# what you did and why there are problems
## creating a copy of the data
dat_1 <- dat
## the following creates a logical vector of length 5
## but only two of them are TRUE, and those are last positions
dat_1$deathstatus == 0
#> [1] FALSE FALSE FALSE TRUE TRUE
## while this creates a numeric vector of length 5
## calculating age at 2017/12/31 for all people, which you don't need actually
round(as.numeric((difftime("2017-12-31",
dat_1$birthday,
units = "days")) / 365.25),
digits = 2)
#> [1] 29.13 49.21 29.13 27.04 29.13
## so this line tries to assign 5 numbers to 2 places
## obviously, it fails and hence shows the warning
## not only that, it assigns only the first 2 values of the numeric vector
## to the positions where the logical vector is TRUE
dat_1$age[dat_1$deathstatus == 0] <- round(as.numeric((difftime("2017-12-31",
dat_1$birthday,
units = "days")) / 365.25),
digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 0] <-
#> round(as.numeric((difftime("2017-12-31", : number of items to replace is
#> not a multiple of replacement length
## verify it below
dat_1
#> name birthday deathstatus deathdate age
#> 1 1 1988-11-15 1 2009-11-16 NA
#> 2 2 1968-10-15 1 2007-11-16 NA
#> 3 3 1988-11-15 1 2008-11-16 NA
#> 4 4 1990-12-15 0 <NA> 29.13
#> 5 5 1988-11-15 0 <NA> 49.21
## same problem will persist in next case
## but it'll be hard to notice
## the 1st 3 positions of the logical vector (of length 5) is TRUE here
dat_1$deathstatus == 1
#> [1] TRUE TRUE TRUE FALSE FALSE
## generates a numeric vector of length 5
## computes age till death of all people, so returns NA for alive people
round(as.numeric((difftime(dat_1$deathdate,
dat_1$birthday,
units = "days")) / 365.25),
digits = 2)
#> [1] 21.00 39.09 20.00 NA NA
## you're placing 5 items in 3 holders, and hence get warned
## but it'll seem to be OK
## that's because as the numbers of interest are positioned at the first
## he problem is not quite visible here, but it exists nevertheless
dat_1$age[dat_1$deathstatus == 1] <- round(as.numeric((difftime(dat_1$deathdate,
dat_1$birthday,
units = "days")) / 365.25),
digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 1] <-
#> round(as.numeric((difftime(dat_1$deathdate, : number of items to replace is
#> not a multiple of replacement length
## check below
dat_1
#> name birthday deathstatus deathdate age
#> 1 1 1988-11-15 1 2009-11-16 21.00
#> 2 2 1968-10-15 1 2007-11-16 39.09
#> 3 3 1988-11-15 1 2008-11-16 20.00
#> 4 4 1990-12-15 0 <NA> 29.13
#> 5 5 1988-11-15 0 <NA> 49.21
# what I did
## creating another copy
dat_2 <- dat
## doing the same you did, but a slightly different way
(dat_2 <- within(data = dat_2,
expr = {
birthday <- as.Date(x = birthday)
deathdate <- as.Date(x = deathdate)
age <- sapply(X = name,
FUN = function(index)
{
if(deathstatus[index] == 0)
{
in_days <- difftime(time1 = as.Date(x = "2017-12-31"),
time2 = birthday[index],
units = 'days')
} else
{
in_days <- difftime(time1 = deathdate[index],
time2 = birthday[index],
units = 'days')
}
in_approx_years <- round(x = as.numeric(x = (in_days / 365.25)),
digits = 2)
return(in_approx_years)
})
}))
#> name birthday deathstatus deathdate age
#> 1 1 1988-11-15 1 2009-11-16 21.00
#> 2 2 1968-10-15 1 2007-11-16 39.09
#> 3 3 1988-11-15 1 2008-11-16 20.00
#> 4 4 1990-12-15 0 <NA> 27.04
#> 5 5 1988-11-15 0 <NA> 29.13
PS: If you don't mind, let me point out that the reprex you have posted after modifying your question, it's incorrect. You haven't surrounded the dates by quotes, and so "1968-10-15" became 1943, etc.