What is the best way to wrangle this data with multiple headers?

matthanc · July 8, 2021, 4:33am

Hi all,

I'm working with an untidy excel file with multiple headers and I would love to learn how to parse the data with pivot_longer, if possible.

The data looks something like this:

At first I thought I would be able to load the data and skip the first five rows with read_excel("data.xlsx", skip = 5), though that did not work as the Range eff. information is needed for functions elsewhere. I've used pivot_longer in the past for ggplot2 purposes, but I cannot wrap my head around how to use it here, especially since the sheet has merged cells.

The solution I came up with can be found below, but it'd be great to know if there is a more simple/tidy approach to this. Any insight would be greatly appreciated.

rangeratefunction <- function(nextcoladate) {
  
  #enter date of next cola in YYYY-MM-DD format
  nextcoladate <- as_date(nextcoladate)
  
  rangerates <- read_excel("hourlyrates.xlsx", range = "Ranges!A6:H24")
  
  rangerates <- bind_rows(rangerates, read_excel("hourlyrates.xlsx", range = "Ranges!A27:H74"))
  
  rangecola1 <- bind_rows(read_excel("hourlyrates.xlsx", range = "Ranges!I6:N24"), read_excel("hourlyrates.xlsx", range = "Ranges!I27:N74"))
  rangecola2 <- bind_rows(read_excel("hourlyrates.xlsx", range = "Ranges!O6:T24"), read_excel("hourlyrates.xlsx", range = "Ranges!O27:T74"))
  
  rangerates <- if (today() < nextcoladate) {
    bind_cols(rangerates, rangecola1)
  } else {
    bind_cols(rangerates, rangecola2)
  }
  
  rangerates <- rangerates %>% filter(SetID == "COMMN")
  
  return(rangerates)
}

gueyenono · July 8, 2021, 5:16am

Hi,

Without a toy dataset to play with, it would be fairly difficult to provide help. I recommend you make your question reproducible:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

martin.R · July 8, 2021, 10:01am

tidyxl may help when reading the data:

mara · July 8, 2021, 12:50pm

I highly recommend the free, online book by tidyxl's author, Spreadsheet Munging Strategies:

As well as the worked examples here:

matthanc · July 8, 2021, 4:08pm

Thank you, Martin! I'll check out tidyxl.

matthanc · July 8, 2021, 4:09pm

Thank you, Mara! I'll check out the examples and documentation.

system · July 29, 2021, 4:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.