Consider using readxl as an alternative to xlsx
and you can avoid having to deal with rJava
.
Renaming is easier, so let's start there. (BTW: this is a simple enough problem that it doesn't require a reproducible example, called a reprex, but when in doubt, it's always a good idea to include one.
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(lubridate))
# wouldn't ordinarily give full pathname, but reprex doesn't play nice with here()
df.xlsx1 <- read.csv("~/projects/demo/input.csv", header = FALSE)
colnames(df.xlsx1) <- c("Year", "Month", "Day", "Value (total/day)")
df.xlsx1
#> Year Month Day Value (total/day)
#> 1 1985 1 1 10.0
#> 2 1985 1 2 12.0
#> 3 1985 1 5 11.8
#> 4 1985 1 6 15.0
#> 5 1985 1 7 21.0
colnames(df.xlsx1) <- c("Year", "Month", "Day", "Data")
df.xlsx1
#> Year Month Day Data
#> 1 1985 1 1 10.0
#> 2 1985 1 2 12.0
#> 3 1985 1 5 11.8
#> 4 1985 1 6 15.0
#> 5 1985 1 7 21.0
df.xlsx2 <- df.xlsx1 %>% mutate(Date = make_date(year = Year, month = Month, day = Day)) %>% select(Date, Data)
df.xlsx2
#> Date Data
#> 1 1985-01-01 10.0
#> 2 1985-01-02 12.0
#> 3 1985-01-05 11.8
#> 4 1985-01-06 15.0
#> 5 1985-01-07 21.0
Created on 2019-11-23 by the reprex package (v0.3.0)
The lubridate
package has a concise way of creating date sequences
time_span <- df.xlsx2$Date[[1]] + days(0:6) %>% tibble::enframe(time_span) %>% transmute(Date = value)
time_span
# A tibble: 7 x 1
Date
<date>
1 1985-01-01
2 1985-01-02
3 1985-01-03
4 1985-01-04
5 1985-01-05
6 1985-01-06
7 1985-01-07
And, to find the missing dates
time_span %>% anti_join(df.xlsx2)
Joining, by = "Date"
# A tibble: 2 x 1
Date
<date>
1 1985-01-03
2 1985-01-04