Hi guys,
I have a dataset with 15 columns and 1.82 million columns. The data ranges from 01-01-2003 00:00:01 to 31-12-2015 11:59:59. This is the format of the date column RStudio shows when read that info.csv file. Howevet, I want to data from 01-01-2010 00:00:01 to the end. I've read the info.csv as data<-read.csv('info.csv').
The column name is Date.
Now, how can I read the data within the date range to another dataframe?
You could copy the pre-2010 records into a new data frame and then remove those from the original data frame. Here's an example on a toy data set. I'm assuming that your date column is formatted as a datetime.
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
data <- tibble(date = dmy_hms("01-01-2003 00:00:01",
"01-01-2007 00:00:01",
"31-12-2009 11:59:59",
"01-01-2010 00:00:01",
"31-12-2015 11:59:59"))
# Copy records before 2010 to a new tibble.
pre_2010 <- filter(data, date < dmy_hms("01-01-2010 00:00:01"))
print(pre_2010)
#> # A tibble: 3 x 1
#> date
#> <dttm>
#> 1 2003-01-01 00:00:01
#> 2 2007-01-01 00:00:01
#> 3 2009-12-31 11:59:59
# Filter the original tibble to remove the pre-2010 entries.
data <- filter(data, date >= dmy_hms("01-01-2010 00:00:01"))
print(data)
#> # A tibble: 2 x 1
#> date
#> <dttm>
#> 1 2010-01-01 00:00:01
#> 2 2015-12-31 11:59:59
Also, note that that with this code, any observations with a timestamp of exactly 31-12-2009 11:59:59 will not be included. If you want to include them, replace < with <=.