This is pollen count data that I have read in off a health dept website.
Note that the pollen variety is in the third line of the tibble, but can
occasionally change. The total number of varieties remains constant, but sometimes
a substitution is made. In this case, in the second tibble "Black Gum" has been
replaced by "Alnus(Alder)".
The actual data consists of 75 files, each with 28-31 observations over 47 variables.
library(tidyverse)
df1 <- tribble(
~..1, ~..2, ~..3, ~..4, ~..5,
"Month: JANUARY", NA, NA, NA, NA,
"YEAR: 2013", NA, NA, NA, NA,
"DATE", "Ash", "Ashe Juniper / Bald Cypress", "Black Gum", "Black Walnut",
"1", NA, NA, NA, NA,
"2", NA, "6", NA, NA,
"3", NA, "2", NA, NA,
"4", NA, NA, NA, NA,
"5", NA, NA, NA, NA,
"Total", NA, "8", NA, NA
)
df2 <- tribble(
~..1, ~..2, ~..3, ~..4, ~..5,
"Month: DECEMBER", NA, NA, NA, NA,
"YEAR: 2013", NA, NA, NA, NA,
"DATE", "Ash", "Ashe Juniper / Bald Cypress", "Alnus(Alder)", "Black Walnut",
"1", NA, NA, NA, NA,
"2", NA, NA, NA, NA,
"3", NA, NA, NA, NA,
"4", NA, "6", NA, NA,
"5", NA, "8", NA, NA,
"Total", NA, "14", NA, NA
)
Desired output
The final tibble should have pollen variety as a variable name (with suitable
cleanup), as shown below (column order not important)
dfout <- tribble(
~Date, ~Ash, ~Ashe_JuniperOrBald_Cypress, ~Black_Gum, ~Black_Walnut, ~Alnus,
"01/01/2013", NA, NA, NA, NA, NA,
"01/02/2013", NA, "6", NA, NA, NA,
"01/03/2013", NA, "2", NA, NA, NA,
"01/04/2013", NA, NA, NA, NA, NA,
"01/05/2013", NA, NA, NA, NA, NA,
"12/01/2013", NA, NA, NA, NA, NA,
"12/02/2013", NA, NA, NA, NA, NA,
"12/03/2013", NA, NA, NA, NA, NA,
"12/04/2013", NA, "6", NA, NA, NA,
"12/05/2013", NA, "8", NA, NA, NA
)
I can handle creating the Date values, but I am drawing a blank on how to use the values stored
in the tibble to create variable names.
translate <- tribble(
~from, ~to,
"Ash", "Ash",
"Ashe Juniper / Bald Cypress", "Ashe_JuniperOrBald_Cypress",
"Black Gum", "Black_Gum",
"Black Walnut", "Black_Walnut",
"Alnus(Alder)", "Alnus"
)
# Do the date (that's easy)
mon <- str_remove(df1[1,]$..1, "Month:\\s*")
yr <- str_remove(df1[3,]$..1, "YEAR:\\s*")
dates <- df1[,1] %>%
tail(-5) %>%
head(-1) %>%
unlist() %>%
paste(mon, ., yr) %>%
mdy()