I have the following problem. The data frames I am using have hourly observations, but occasionally there are values missing due to faults in the observing station. In these cases, rows were completely skipped, instead of inserting NAs. So it looks like this e.g.:
time code value
201901010000 3.5
201901010100 4.2
201901010200 3.2
201901010300 1.5
201901010400 5.3
201901010800 3.8
201901010900 2.9
201901011000 4.6
What I need, is to fix this, so it looks like that:
time code value
201901010000 3.5
201901010100 4.2
201901010200 3.2
201901010300 1.5
201901010400 5.3
201901010500 NA
201901010600 NA
201901010700 NA
201901010800 3.8
201901010900 2.9
201901011000 4.6
It is okay for me to have NAs, I just need full samples without gaps. I found that you can achieve this with the "complete" function of tidyr, but I don't have dates or years etc. How can I solve this using this hourly sequence?
Thank you in advance! If there any questions, feel free to ask any time.
EDIT: I didn't consider multiple dates in my approach. See Yarnabrina's post below for a more robust solution.
You rightly identified tidyr::complete() as the most appropriate function for the job. It works for all types of values not just dates. All you need to do is supply it with the sequence (increments of 100 in this case).
Hi @mihefra! Saw you following comment #6, but I'm not sure I understood the context, and hence can't help. I hope others will be able to guide you. Good luck!
Thanks a lot, I'll try it! Otherwise, there are also correct and "clean" samples, that have all the values and no gaps. My first idea was that I can use the time code vector of the correct samples as kind of a reference sequence. Can that be useful or beneficial? I tried a double for loop that compares the defective sequences with the correct ones, but it takes too much computation and R freezes.
Hi again! It seems to work so far, thanks again. But how can I set definite values for start and end of the sequence? Some of my sample start at a time later than the start of the whole observation. If I place the starting date (which is 201601010000) instead of "min(timecode)", I get an error message. Not sure, where the problem is.