Hello, I have a large data set with over thousands of entries and thousands of dates. There are multiple columns. One of the columns is called "date", and it has thousands of entries with the dates in the format, "1996 - 01- 01", etc. I am trying to separate the date column into three new columns, (Year, Month, Day). How do I do this using tidyverse commands?
Take a look at lubridate
package, specifically lubridate::year
, lubridate::month
, and lubridate::day
.
You can use them along with dplyr::mutate
. Something like this:
x %>%
dplyr::mutate(year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date))
Thanks!
But is there a way to do this with dplyr commands or tidyr commands?
tidyr
and dplyr
don't have date manipulation functions. lubridate
is the tidyverse
package for date functions (although it's not one of the "core" tidyverse
packages loaded by the tidyverse
library).
Based on the information you provided, it looks like your dates might not be in R Date format. The example below is converts the date
column to Date format and then uses mutate_at
to create separate columns for year, month, and day. This is similar to mishabalyasin's answer, but mutate_at
allows all three functions to be applied to the date
column with a single line of code.
library(tidyverse)
library(lubridate)
# Fake data
x = data.frame(date=c("2018 - 01 - 04", "2018 - 02 - 16"))
x = x %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month, day))
x
date year month day
1 2018-01-04 2018 1 4
2 2018-02-16 2018 2 16
Here's a non-lubridate solution. Though I'd probably still use lubridate because it seems to be more robust to variations, for instance if a date accidentally is encoded "19960101"... I think lubridate::ymd() would still parse it.
library(tidyr)
df <- data.frame(date = "1996 - 01- 01", stringsAsFactors = FALSE)
df %>%
separate(date, sep="-", into = c("year", "month", "day"))
Ahh this is what I was looking for!! Thanks!! I just have one more question. Where it says " data.frame(date=c(" 2018 - 01- 04")), instead of using date = c(), could I use date = [1:500] for a long list of data? Otherwise it would take forever to input all of my dates from my data set
Actually I figured it out!!
If your question's been answered, would you mind choosing a solution? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.
Thanks
2 posts were split to a new topic: help with "Error in as.POSIXlt.character(x, tz = tz(x)) : character string is not in a standard unambiguous format."