read CSV file Until...

Doudou · October 7, 2019, 3:58pm

Hi All,
I'm using read_CSV2 to read CSV files, with the same structures, but I would like to read them until a certain row (which varie according to the CSV):
row_2 <- 24
row_3 <- 21
info2 <- read_csv2(data2,
n_max=row_2,
skip=2)[ ,c(1:2, 12:14)] #select columns 1to2 and then 12 to 14

info3 <- read_csv2(data3,
n_max=row_3,
skip=2)[ ,c(1:2, 12:14)]

Here I would like to do 2 things:

Create a macro to automatize my read CSV.
And create a macro variable to read the CSV file until the appropriate row.

But I don't know how to catch the right row number automatically.

Basically, the appropriate row would be the row number x where the 1st column contains the 1st 'Total' word. Or also another way would be to select all the rows until the next empty row ...
Does someone knows how I can select and get the row number ?

FJCC · October 7, 2019, 4:57pm

Unless the files are very large, I would read in the entire file and make a subset afterward.

DF <- read.csv2("/home/fjcc/R/Play/Dummy.csv")
DF
#>    Name Value
#> 1     A     2
#> 2     B     4
#> 3     C     6
#> 4 Total    12
#> 5     F     4
#> 6     g     6
#> 7     q    23
#> 8 Total    33
Totals <- which(DF$Name == "Total")
DF2 <- DF[1:Totals[1], ]
DF2
#>    Name Value
#> 1     A     2
#> 2     B     4
#> 3     C     6
#> 4 Total    12

^{Created on 2019-10-07 by the reprex package (v0.2.1)}

pieterjanvc · October 7, 2019, 7:06pm

Hi,

I agree that @FJCC option is the one you should implement in case the files are small given it guarantees the best performance.

If however the files are too large to read all at once, you can use something like this

filePath = "data.csv"
separator = ","
hasHeader = T
rowStart = 1
rowEnd = 20

myData = scan(filePath, "character", skip = rowStart + hasHeader - 1, nlines = rowEnd - rowStart + 1)
myData = purrr::map_df(1:length(myData), function(x){
  data.frame(t(unlist(strsplit(myData[x], split = separator))), stringsAsFactors = F)
}) %>% readr::type_convert()

if(hasHeader){
  colnames(myData) = unlist(strsplit(scan(filePath, "character", nlines = 1), split = ","))
}

The scan function will read any file's lines. The skip and nlines arguments let you decide which lines to read in
Since you read the lines as a string, you need to split the string according to the separator the file has (this example comma)
After splitting you merge all into a data frame and use the type_convert to guess the column class
Depending on a header or not, you assign the column names in the end

The final result is the first 20 lines of the data file, in this case with header (so 21 lines in total were read)

Hope this helps,
PJ

Doudou · October 8, 2019, 1:48pm

Hello,

It works TipTop

Thank you

system · October 15, 2019, 1:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.