Read with read.table-txt.file. With out last column

walter1 · August 13, 2020, 3:13pm

Hi commnity

I´m trying to read txt.files that have different amount of columns. The problem is that at the end of the files, it have an empty column. I wounder how can I read the file without the last column or call the last column as "last col".

I check the below code and others but all requered that the d.f had the same amount of column, and it is not the case. What can I do?

data <- read.csv(fileCSV)[,(ncol(data)-1)]

AlexisW · August 14, 2020, 8:56pm

That code won't work since when you write ncol(data), the variable data has not been created yet (or contains the result of the previous dataframe). You can do it in two steps:

data_tmp <- read.csv(filecsv)
data <- data_tmp[,(ncol(data_tmp)-1)]

You can also consider using the function read_csv from the readr package, whose option col_types allows you to exclude a column when reading.

walter1 · August 15, 2020, 7:37pm

Hi

I have some issues with the code, the first one is that all the files are txt format and I can´t use csv code, the second is that the d.f. has different quantity of column and the last column dosen´t have a col_name.

How can I read the file and at the same time make it understand that the last column does´nt matter or name it with something else.

coln<-c("Importers","2015","2016","2017","2018","2019","xxxx")
l<-list.files("./y")

my_files_contents<- list(length = length(l))
for(ff in 1:length(l)){
  a<- str_extract(paste0("./y/",l[ff]), "[by].*\\.")
  
  my_files_contents[[ff]] <- read.table(paste0("./y/",l[ff]), sep = "\t",skip = 1, *col.names = coln*, is.na(0))%>%mutate(country=a)
}

all_files_contents1 <- bind_rows(my_files_contents)

AlexisW · August 15, 2020, 8:36pm

Just to be clear, the names listed in coln would apply for the first file in the directory, but the second would be missing "2019", and the third could have an additional "2020", is that right? But the files always have a header with all columns names except the last? It would be easier with some sample data.

In that case, the easiest solution I think is to first read the first line (header), get the number of columns and their names, then read the rest.

# Create example file
file_contents <- "A\tB\tC
1\t2\t3\t4
5\t6\t7\t8"
writeLines(file_contents, "test.txt")

# read first line
first_line <- read.table("test.txt",sep = "\t",header = FALSE,nrows = 1)
# read the rest
content <- read.table("test.txt", col.names = c(first_line, "xxx"), skip = 1)
content[, -ncol(content)]
#   A B C
# 1 1 2 3
# 2 5 6 7

But if each dataframe has a different number of columns, you can't bind them together, or you'll need to fill some cells with NA, which may or may not make sense in your context.

IF I misunderstood and each file has the same number of columns (e.g. the ones in coln), then it's even easier, you can do it with readr:

read_delim("test.txt",
           col_types = "ccc-",
           col_names = c("A","B","C"),
           delim = "\t",
           skip = 1)

And the looping comes naturally, something like that should work:

list.files("mydir") %>%
  map_dfr(~ read_delim(.x,
           col_types = "ccc-",
           col_names = c("A","B","C"),
           delim = "\t",
           skip = 1) %>%
     add_column(country=.x))

system · September 7, 2020, 3:32am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.