A simple problem needs help【number of items to replace is not a multiple of replacement length】

AshTaker · February 28, 2019, 8:23am

Hello community，
I'm a new user of R. Now I meet a problem and want to get help here.
I want to use the function“difftime” to calculate the age of everybody in my dataframe. My sentence is:
dat$age<-round(as.numeric((difftime("2017-12-31", dat$BIRTHDAY, units = "days"))/365.25),digits = 2)
But the result show there are so many NA in "age", and the system point out that “number of items to replace is not a multiple of replacement length”.
Here is my dataframe:

``` r
data.frame(
id = c(1,2,3,4,5),
birthday = c("1968-10-15","2007-11-16","1988-11-15","2008-11-16","1995-10-20"),
deathstatus = c(1,1,1,0,0),
deathdate = c("2009-11-16", "2008-11-16", "2007-11-16",NA,NA)
)
#>   id   birthday deathstatus  deathdate
#> 1  1 1968-10-15           1 2009-11-16
#> 2  2 2007-11-16           1 2008-11-16
#> 3  3 1988-11-15           1 2007-11-16
#> 4  4 2008-11-16           0       <NA>
#> 5  5 1995-10-20           0       <NA>

^{Created on 2019-03-04 by the reprex package (v0.2.1)}

Any suggestion is welcome. Thank you.

Yarnabrina · February 28, 2019, 12:56pm

Welcome to this community.

To get more helpful answers, please ask questions with a reproducible example. If you don't know how, here's a helpful link:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Now, I generated some birthdays and calculated ages at 2019-02-28. It seems to work correctly without any error. See here:

set.seed(seed = 24971)

simulated_birthdays <- sample(x = seq(from = as.Date(x = '1991/01/01'),
                                      to = as.Date(x = '2000/12/31'),
                                      by = 11))

head(x = simulated_birthdays)
#> [1] "1992-06-12" "1991-03-19" "1992-04-29" "1999-04-03" "1991-06-04"
#> [6] "1997-03-05"

current_date <- as.Date(x = '2019/02/28')

ages_in_days <- difftime(time1 = current_date,
                         time2 = simulated_birthdays,
                         units = 'days')

head(x = ages_in_days)
#> Time differences in days
#> [1]  9757 10208  9801  7271 10131  8030

approximate_ages_in_years <- round(x = as.numeric(x = (ages_in_days / 365.25)),
                                   digits = 2)

head(x = approximate_ages_in_years)
#> [1] 26.71 27.95 26.83 19.91 27.74 21.98

^{Created on 2019-02-28 by the reprex package (v0.2.1)}

Can you please check whether you are in a better luck with dat$age<-round(as.numeric((difftime(as.Date("2017-12-31"), dat$BIRTHDAY, units = "days"))/365.25),digits = 2)?

woodward · February 28, 2019, 6:24pm

Perhaps some of your BIRTHDAY values are incorrectly formatted?

AshTaker · March 1, 2019, 1:07pm

Thank you for your patience.
I also considered may be there are some format errors in my value. My data was imported from CSV, so I tried to paste all my data to a "TXT" then paste it back to my CSV. Then I transformed the data into DATE format with sentence "as.Date()" in R.
I knew it may be a stupid way, but it still didn't work.

AshTaker · March 1, 2019, 1:14pm

Thank you for your patience.
In fact, my dataframe is :
name birthday deathstatus deathdate
1 1988-11-15 1 2009-11-16
2 1968-10-15 1 2007-11-16
3 1988-11-15 1 2008-11-16
4 1990-12-15 0 NA
5 1988-11-15 0 NA
And I want to calculate people's age wheather they're alive or not. My sentence is：
dat$age[dat$deathstatus == 0]<-round(as.numeric((difftime("2017-12-31", dat$birthday,units = "days"))/365.25),digits = 2)
dat$age[dat$deathstatus == 1]<-round(as.numeric((difftime(dat$deathdate, dat$BIRTHDAY,units = "days"))/365.25),digits = 2)
Finally, there are many NA in age column. I have already check my sentence，it seems work and I didn't find any similarity among rows with NA.
I really hope you can give me some advice.
Thank you again.

andresrcs · March 1, 2019, 1:25pm

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Yarnabrina · March 1, 2019, 4:18pm

As Andres has mentioned, providing a reprex is very helpful. Please keep that in mind for your future posts. It helps others to help you.

There's a problem in your code. You don't have a column named BIRTHDAY in your dataset, but you used it. Most probably, it was a typo, and hence I corrected it below in my example.

Now, let's see what's happening:

# creating the dataset
dat <- data.frame(stringsAsFactors = FALSE,
                  name = c(1, 2, 3, 4, 5),
                  birthday = c("1988-11-15", "1968-10-15", "1988-11-15", "1990-12-15", "1988-11-15"),
                  deathstatus = c(1, 1, 1, 0, 0),
                  deathdate = c("2009-11-16", "2007-11-16", "2008-11-16", NA, NA))

# what you did and why there are problems

## creating a copy of the data
dat_1 <- dat

## the following creates a logical vector of length 5
## but only two of them are TRUE, and those are last positions
dat_1$deathstatus == 0
#> [1] FALSE FALSE FALSE  TRUE  TRUE

## while this creates a numeric vector of length 5
## calculating age at 2017/12/31 for all people, which you don't need actually
round(as.numeric((difftime("2017-12-31",
                           dat_1$birthday,
                           units = "days")) / 365.25),
      digits = 2)
#> [1] 29.13 49.21 29.13 27.04 29.13

## so this line tries to assign 5 numbers to 2 places
## obviously, it fails and hence shows the warning
## not only that, it assigns only the first 2 values of the numeric vector
## to the positions where the logical vector is TRUE
dat_1$age[dat_1$deathstatus == 0] <- round(as.numeric((difftime("2017-12-31",
                                                                dat_1$birthday,
                                                                units = "days")) / 365.25),
                                           digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 0] <-
#> round(as.numeric((difftime("2017-12-31", : number of items to replace is
#> not a multiple of replacement length

## verify it below
dat_1
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16    NA
#> 2    2 1968-10-15           1 2007-11-16    NA
#> 3    3 1988-11-15           1 2008-11-16    NA
#> 4    4 1990-12-15           0       <NA> 29.13
#> 5    5 1988-11-15           0       <NA> 49.21

## same problem will persist in next case
## but it'll be hard to notice

## the 1st 3 positions of the logical vector (of length 5) is TRUE here
dat_1$deathstatus == 1
#> [1]  TRUE  TRUE  TRUE FALSE FALSE

## generates a numeric vector of length 5
## computes age till death of all people, so returns NA for alive people
round(as.numeric((difftime(dat_1$deathdate,
                           dat_1$birthday,
                           units = "days")) / 365.25),
      digits = 2)
#> [1] 21.00 39.09 20.00    NA    NA

## you're placing 5 items in 3 holders, and hence get warned
## but it'll seem to be OK
## that's because as the numbers of interest are positioned at the first
## he problem is not quite visible here, but it exists nevertheless
dat_1$age[dat_1$deathstatus == 1] <- round(as.numeric((difftime(dat_1$deathdate,
                                                                dat_1$birthday,
                                                                units = "days")) / 365.25),
                                           digits = 2)
#> Warning in dat_1$age[dat_1$deathstatus == 1] <-
#> round(as.numeric((difftime(dat_1$deathdate, : number of items to replace is
#> not a multiple of replacement length

## check below
dat_1
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16 21.00
#> 2    2 1968-10-15           1 2007-11-16 39.09
#> 3    3 1988-11-15           1 2008-11-16 20.00
#> 4    4 1990-12-15           0       <NA> 29.13
#> 5    5 1988-11-15           0       <NA> 49.21

# what I did

## creating another copy
dat_2 <- dat

## doing the same you did, but a slightly different way
(dat_2 <- within(data = dat_2,
                 expr = {
                   birthday <- as.Date(x = birthday)
                   deathdate <- as.Date(x = deathdate)
                   age <- sapply(X = name,
                                 FUN = function(index)
                                 {
                                   if(deathstatus[index] == 0)
                                   {
                                     in_days <- difftime(time1 = as.Date(x = "2017-12-31"),
                                                         time2 = birthday[index],
                                                         units = 'days')
                                   } else
                                   {
                                     in_days <- difftime(time1 = deathdate[index],
                                                         time2 = birthday[index],
                                                         units = 'days')
                                   }
                                   in_approx_years <- round(x = as.numeric(x = (in_days / 365.25)),
                                                            digits = 2)
                                   return(in_approx_years)
                                 })
                 }))
#>   name   birthday deathstatus  deathdate   age
#> 1    1 1988-11-15           1 2009-11-16 21.00
#> 2    2 1968-10-15           1 2007-11-16 39.09
#> 3    3 1988-11-15           1 2008-11-16 20.00
#> 4    4 1990-12-15           0       <NA> 27.04
#> 5    5 1988-11-15           0       <NA> 29.13

^{Created on 2019-03-01 by the reprex package (v0.2.1)}

Hope this helps.

PS: If you don't mind, let me point out that the reprex you have posted after modifying your question, it's incorrect. You haven't surrounded the dates by quotes, and so "1968-10-15" became 1943, etc.

AshTaker · March 3, 2019, 12:36pm

Thank you!
I' ll correct it and try to turn this into reprex.

AshTaker · March 3, 2019, 12:46pm

Thank you so much for your help!
Finally solved a problem that bothered me for several days.
Thank you again!

AshTaker · March 4, 2019, 2:13am

I made a stupid mistake.:喜悦:

system · March 11, 2019, 2:13am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.