Just to be clear, the names listed in coln
would apply for the first file in the directory, but the second would be missing "2019", and the third could have an additional "2020", is that right? But the files always have a header with all columns names except the last? It would be easier with some sample data.
In that case, the easiest solution I think is to first read the first line (header), get the number of columns and their names, then read the rest.
# Create example file
file_contents <- "A\tB\tC
1\t2\t3\t4
5\t6\t7\t8"
writeLines(file_contents, "test.txt")
# read first line
first_line <- read.table("test.txt",sep = "\t",header = FALSE,nrows = 1)
# read the rest
content <- read.table("test.txt", col.names = c(first_line, "xxx"), skip = 1)
content[, -ncol(content)]
# A B C
# 1 1 2 3
# 2 5 6 7
But if each dataframe has a different number of columns, you can't bind them together, or you'll need to fill some cells with NA
, which may or may not make sense in your context.
IF I misunderstood and each file has the same number of columns (e.g. the ones in coln
), then it's even easier, you can do it with readr
:
read_delim("test.txt",
col_types = "ccc-",
col_names = c("A","B","C"),
delim = "\t",
skip = 1)
And the looping comes naturally, something like that should work:
list.files("mydir") %>%
map_dfr(~ read_delim(.x,
col_types = "ccc-",
col_names = c("A","B","C"),
delim = "\t",
skip = 1) %>%
add_column(country=.x))