I am so stuck trying to split a column in an RStudio dataframe into two columns. In the attached screenshot, I'm trying to split column "Date_Time_UTC" (example: 1/1/2015 11:00) into a separate column for the date (ex. 1/1/2015) and time (ex. 11:00). This could be done by splitting the entire column at the space in between the date and time, but I cannot for the life of me figure out how to do this.
I would also be okay with modifying this column to just delete everything after the space (so that only the date remains), but this is less ideal.
ANY HELP would be so much appreciated!!
Thanks,
Kelly
It might be overkill, but the tidyr package has a very handy separate() function to split data in variables. Assuming your Date_Time_UTC variable is just a character string, you can split it into parts like this:
tidyr::separate(
data = df_temperature,
col = Date_Time_UTC,
sep = " ",
into = c("date", "time")
)
Alternately, tidyr has an extract() function that lets you use a regular expression groups to parse a column:
tidyr::extract(
data = df_temperature,
col = Date_Time_UTC,
into = c("date", "time"),
"(\\S{10}) (\\S{5})"
)
Thank you so much, Robert, for your quick help! I am clearly new to learning R and R Studio I tried the solution that you suggested using the separate function (although I changed the data = field to "data = df_temperature_2015" to match the dataframe name), and received this error:
Error in [.data.frame(x, x_vars) : undefined columns selected
I did originally rename the column names in this dataframe (it was a generic string of characters before I changed to "Date_Time_UTC").
Btw, I received the same error when trying the extract function.
Woah, thank you technocrat! I was not aware of this library and this would probably be useful in translating the character string that is currently in my column "Date_Time_UTC" into a numeric / date value. As an end goal, I am trying to plot these temperature measurements over time but cannot do so currently with the date and time column in the format that it is. Would you be willing to advise how I might apply some of the functions from the lubridate library to the entire "Date_Time_UTC" column, using perhaps mutate() to paste the reformatted data into new columns?
The reprex below illustrates how to add variables to a dataframe that reformat presentation of the date time string, in one case, and convert the hour component to integer on the other. It also shows that ggplot can deal with datetime objects as an axis without the need to convert from datetime to an integer.
Kelly, here's a more complete example of my original thought on splitting columns generally — though @technocrat is right that it makes more sense to familiarize yourself with lubridate for work with dates and times.
df_temperature_2015 <- data.frame(
Date_Time_UTC = c(
as.POSIXct("2020-12-09 13:04:00"),
as.POSIXct("2020-12-10 13:20:00"),
as.POSIXct("2020-12-11 13:30:00"))
)
tidyr::separate(
data = df_temperature_2015,
col = Date_Time_UTC,
sep = " ",
into = c("date", "time")
)
date time
1 2020-12-09 13:04:00
2 2020-12-10 13:20:00
3 2020-12-11 13:30:00
Thank you, Robert!! I was able to wrap up my project analysis last week and I so appreciate your help and quick responses. I couldn't have done it without you! Happy coding