I have a dataframe with dates and observations. The observation from 01.09.2020 untill 27.10.2020 are not usable and i need to cut these dates and their observations out of the dataframe. How do i do it?
First: Instal caret package:
install.packages("caret")
library(caret)
Second: Create a partition of the percentage that you want, in this case, 50% or 0.5, with createDataPartition() function
new <- createDataPartition(y = data$column_data, times = 1, p = 0.5, list = FALSE)
Now you have new vector with your 50% of data.
I hope this helps!
OK for example: 01.01.2020 to 01.09.2020 :
data1 <- data %>%
filter(date_colum %in% 01.01.2020:01.09.2020)
That´s all!
And if you need a sample of 500 or 1000 as in this case, you can pipe sample_n function:
data1 <- data %>%
filter(date_colum %in% 01.01.2020:01.09.2020) %>%
sample_n(1000)
Sorry I didn´t read you wanted to exclude 27.10.2020
Yes, my problem is that there is data before and after the section i need to exclude. Is there a opposite to the %in% Operator?
Are you looking for !
?
If you're only looking to drop a couple of dates, maybe something like...
mydf <- data.frame("month"=c("jan","feb","mar","apr","jun","july"))
mydf
month
1 jan
2 feb
3 mar
4 apr
5 jun
6 july
#drop feb and mar
mydf2 <- data.frame("month"=mydf[!mydf$month=="feb" &
!mydf$month=="mar",])
mydf2
month
1 jan
2 apr
3 jun
4 july
Asseming your tata are "dates
library(lubridate)
city <- structure(list(name = c("Abilene", "Akron", "Albany", "Albuquerque",
"Alexandria", "Allentown", "Amarillo", "Anaheim", "Anchorage",
"Ann Arbor", "Arden-Arcade", "Arlington", "Arlington", "Arvada",
"Athens-Clarke County", "Atlanta", "Augusta-Richmond County",
"Aurora", "Aurora", "Austin"), pop = c(115930L, 217074L, 93994L,
448607L, 128283L, 106632L, 173627L, 328014L, 260283L, 114024L,
92040L, 332969L, 174838L, 102153L, 101489L, 416474L, 199775L,
276393L, 142990L, 656562L), ddate = structure(c(18262, 18263,
18264, 18265, 18266, 18267, 18268, 18269, 18270, 18271, 18272,
18273, 18274, 18275, 18276, 18277, 18278, 18279, 18280, 18281
), class = "Date")), row.names = c(NA, 20L), class = "data.frame")
library(lubridate)
city$ddate <- seq(ymd("2020/01/01"), ymd("2020-01-20"), by = "day")
dat1 <- subset(city, ddate < ymd("2020/01/15") | ddate > ymd("2020/01/18"))
dat1
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.