Time Series Subsetting by NA and length

Thought I'd share what I ended up doing for anyone who is working on a similar problem:

#convert the variable column to a logical vector based upon whether data is missing or present

V1_seqs <- df$V1
V1_seqs <- !is.na(V1_seqs)

#print out the runs of consecutive entries using run length encoding function and turn the result into a dataframe

runs <- rle(V1_seqs)
runs <- as.data.frame(unclass(runs))

#return the indices of all items in the 'runs' dataframe which have desireed length and contain data (values==TRUE)

seqIndices <- which(runs$lengths>-600000 & runs$values==TRUE)

#This gives us the indices of the items in the runs list, but not in the original dataframe next:

dfStartIndex <- c()
dfSeqLength <- c()

for (i in seqIndices) {
dfStartIndex[i] <- if (i>1) sum(runs$lengths[1:i-1]) else(1)
dfSeqLength[i] <- runs$lengths[i]
}

#bind the vectors into a dataframe containing the starting index and lengths of the sequences and remove NAs

dfExtracted <- na.omit(data.frame(dfStartIndex,dfSeqLength))

From there, the indices and lengths can be used to subset the original dataframe and extract the sequences.

Thanks again for your help guys!

2 Likes