Complete Newbie. I need to separate a column that contains date and timestamp data into individual columns

Hello everyone.
As my topic header suggests, I am completely new to R and have a ton of questions that may be basic to many but the beginnings for me.

I am working on a sample dataset that is very large. It has several columns that contain date and timestamp data combined (2/23/2018 14:57:35, for example). Would someone explain to me how to go about separating the date into one column and the timestamp into its own column, and how I can do this for all of the columns that contain these (there are about four columns that contain date and timestamp information, each with close to 10K rows)? I've already loaded Tidyverse, and imported the csv file of data (using the Upload and then Import function). I just don't know what to do next. Again, a very basic question, I know. Any help is greatly appreciated.

Welcome to the community! Below is one approach that uses the separate function to do this for one column. You could repeat this for the other columns as well, specifying new into names.

library(tidyverse)

df = data.frame(timestamp = c('2/23/2018 12:57:32',
                              '12/23/2020 16:57:35',
                              '3/21/2018 11:57:45'))

# drops "timestamp" column
out1 = df %>%
  separate(timestamp, into = c('date', 'time'), sep = ' ')

out1
#>         date     time
#> 1  2/23/2018 12:57:32
#> 2 12/23/2020 16:57:35
#> 3  3/21/2018 11:57:45

# retains "timestamp column
out2 =  df %>%
  separate(timestamp, into = c('date', 'time'), sep = ' ', remove = F)

out2
#>             timestamp       date     time
#> 1  2/23/2018 12:57:32  2/23/2018 12:57:32
#> 2 12/23/2020 16:57:35 12/23/2020 16:57:35
#> 3  3/21/2018 11:57:45  3/21/2018 11:57:45

Created on 2022-11-22 with reprex v2.0.2.9000

1 Like

Thank you for the (very quick) reply! If I'm understanding correctly, I would need to input all the data from the column(s) into the code? Is there a way to reference the entire column (as a whole, with all 10K rows of data), without having to input each value?

For example: my dataset is called Orders_Dataset. One of the columns in the dataset is labeled Order_Approved_At, and it contains the date and timestamp info combined. Like I said, there are close to 10K records. Can I do:

Order_approval<-("Orders_Dataset", "Order_Approved_At") to assign this data to Order_approval AND THEN do:

library(tidyverse)

df = data.frame(Order_approval)

drops "timestamp" column

out1 = df %>%
separate(Order_approval, into = c('date', 'time'), sep = ' ')

out2 = df %>%
separate(Order_approval, into = c('date', 'time'), sep = ' ', remove = F)

And will all of this (or none of it) create two new columns containing this information in my dataset? I'm thinking I would need to use the mutate function (???) somehow? Have I muddled this whole thing in my head?

Since I did not know the structure of your data, I created df as a sample dataset. Using your data, you could do the following code, which will add two new columns to your data frame (date, time) and retain the original column.

Orders_Dataset = Orders_Dataset |>
  separate(Order_Approved_At, into = c('date', 'time'), sep = ' ', remove = F)

Thanks! I'll give it a try now and see what happens.

OK, this is what I did. To explain a bit further (because I changed a few things):

I used purchase_data instead of Orders_dataset as the vector (??) I wanted to assign all of this to. This more closely matched the column of data I wanted to have separated ("order_purchase_timestamp"). The dataset is actually called olist_orders_dataset. So this is what I ended up entering:

purchase_data<-c("olist_orders_dataset", 'order_purchase_timestamp')
purchase_data = purchase_data |>
separate_rows(order_purchase_timestamp, into = c('date', 'time'), sep = ' ', remove = F)

I ended up with the following error:

Error in separate_rows():
! Arguments in ... must be passed by position, not name.
:heavy_multiplication_x: Problematic arguments:
• into = c("date", "time")
• remove = F
Run rlang::last_error() to see where the error occurred.

I ran "rlang::last_error() per the instructions and got a message saying to run rlang::last_trace(), which I did. The following came up:

rlang::last_error()
<error/rlib_error_dots_named>
Error in separate_rows():
! Arguments in ... must be passed by position, not name.
:heavy_multiplication_x: Problematic arguments:
• into = c("date", "time")
• remove = F


Backtrace:

  1. tidyr::separate_rows(...)
    Run rlang::last_trace() to see the full context.

rlang::last_trace()
<error/rlib_error_dots_named>
Error in separate_rows():
! Arguments in ... must be passed by position, not name.
:heavy_multiplication_x: Problematic arguments:
• into = c("date", "time")
• remove = F


Backtrace:

  1. └─tidyr::separate_rows(...)
  2. └─ellipsis::check_dots_unnamed()
  3. └─rlang:::action_dots(...)
    
  4.   ├─base (local) try_dots(...)
    
  5.   └─rlang (local) action(...)
    

Is this saying that instead of putting the column title (order_purchase_timestamp), I should be putting the position number of the column in the dataset (4)? If that is the case, would it look like this:

purchase_data<-c("olist_orders_dataset", 'order_purchase_timestamp')
purchase_data = purchase_data |>
separate_rows(4, into = c('date', 'time'), sep = ' ', remove = F)

My apologies. I accidentally listed separate_rows in the last example (which I have updated) instead of using separate like in my first response. That said, if you look at the documentation for separate, the first argument must be a data frame. The function will error if passed purchase_data because this object is a character vector.

purchase_data <- c("olist_orders_dataset", 'order_purchase_timestamp')
class(purchase_data)
#> [1] "character"

If olist_orders_dataset is a data frame, you could try the following:

olist_orders_dataset = olist_orders_dataset %>%
  separate(order_purchase_timestamp, into = c('date', 'time'), sep = ' ', remove = F)

OK, I'll give that a try. Thanks for your patience.

I tried all of this and luckily I didn't get an error msg this time. However, nothing happened. I'm just going to delete everything and start all over again.

What do you get when you do the following?

dplyr::glimpse(olist_orders_dataset)

I decided to delete everything and start from scratch. I looked up some YouTube videos that may help me, as I am more of a visual learner at times. I'll let you know what I ended up doing (if it works). Thanks again.

Here is what I found on YouTube that worked for me. The YouTube channel is Statistics Globe and the video I watched to split date from timestamp is located here: How to Split a Date-Time Column into Separate Variables in R (Example) | as.Date, format, as.POSIXct - YouTube.

#I began by importing my dataset(s)

brazilcust<-read.csv("olist_customers_dataset.csv", header=TRUE, sep=",")
brazilgeo<-read.csv("olist_geolocation_dataset.csv", header = TRUE, sep = ",")
brazilorders<-read.csv("olist_orders_dataset.csv", header = TRUE, sep = ",")

#In the brazilorders dataset, there are several columns that contain date and timestamp info combined into one column. I began by splitting the date and timestamp data in the order_purchase_timestamp column.

brazilorders$date<-as.Date(brazilorders$order_purchase_timestamp)
brazilorders$time<-format(as.POSIXct(brazilorders$order_purchase_timestamp),
format = "%H:%M:%S")

#It added two new columns to the end of my dataset. One thing I should have done was create unique names for each new column, so I will know what the information relates to (brazilorders$order_purchase_date, for example).

Thanks for all your help and patience. I have a long way to go and much more to learn, so I appreciate you taking the time to help.

Glad you found a workable solution. Keep on learning with the rest of us!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.