How to import the text file with special character

nchan08 · November 16, 2021, 12:51am

I am new to RStudio. I am trying to import the following text into Rstudio. The file type is .txt. I would like to remove the first 6 rows (all the text).

AAA: true
BBB: false
CCC: 0
DDD: Wavelengths
EEE: 3648
>>>>>Begin Decimal Data<<<<<
623.821 -216.03
623.881 -216.03
623.941 -216.03

Then I would like to create the data.table, separating into two columns: Col A and Col B as follow.

Col A  Col B
623.821 -216.03
623.881 -216.03
623.941 -216.03

I have a series of .txt files. So, I would like to combine, and delete the rows and separate the columns. I coded as follow:


files <- dir(".", pattern = ".txt$") 
for (i in 1:length(files)) { obj_name <- files %>% str_sub(start = -2) assign(obj_name[i], read_table(files[i], skip = 13)) names(obj_name[i]) <- c("Col_A", "Col_B") }

But it shows the error as follow:

-- Column specification -------------------------------------------------------------
cols(
  `>>>>>Begin` = col_double(),
  Spectral = col_double(),
  `Data<<<<<` = col_character()
)

Warning: 3648 parsing failures.
row col  expected    actual                 file
  1  -- 3 columns 2 columns 'L1_0401_1758_C.txt'
  2  -- 3 columns 2 columns 'L1_0401_1758_C.txt'
  3  -- 3 columns 2 columns 'L1_0401_1758_C.txt'
  4  -- 3 columns 2 columns 'L1_0401_1758_C.txt'
  5  -- 3 columns 2 columns 'L1_0401_1758_C.txt'
... ... ......... ......... ....................
See problems(...) for more details.

williaml · November 16, 2021, 2:02am

Perhaps something like read_tsv? There is an argument to skip rows:

nchan08 · November 16, 2021, 2:16am

Thanks, @williaml ! I tried using read_tsv() as well. But I can't get the expected result, especially skip function doesn't work, even though it works to read the text file.
All the text are still including in the results, but the columns seem to be a single column.

Data.from.HR4D5021_005237.txt.Node
1 AAA: true
2 BBB: false
3 CCC: 0
4 DDD: Wavelengths
5 EEE: 3648
6 >>>>>Begin Decimal Data<<<<<
7 623.821
8 -216.03
9 623.881
10 -216.03
11 623.941
12 -216.03

nchan08 · November 16, 2021, 2:54am

When I do on a single file, read_delim() seems to work:

Test1111 <- read.delim("C:/L1_0401_0603_B.txt", header = FALSE, skip = 14)

But when I worked on a series of file, it doesn't work again.

files <- dir(".", pattern = ".txt$")
for (i in 1:length(files)) {
obj_name <- files %>% str_sub(start = -2)
assign(obj_name[i], read_delim(files[i]))
read_delim(files[i], header = FALSE, skip=14)
}

Error:

Rows: 3660 Columns: 4
-- Column specification -------------------------------------------------------------
Delimiter: " "
chr (4): Data, from, HR4D5021_005094.txt, Node

i Use spec() to retrieve the full column specification for this data.
i Specify the column types or set show_col_types = FALSE to quiet this message.
Error in read_delim(files[i], header = FALSE, skip = 14) :
unused argument (header = FALSE)

What's wrong with this code?

haripadakoley · November 16, 2021, 5:44am

Hi @nchan08 , you can try the following code. Hopping this will help you.

these code for "|" pipe separated .txt files, you can use any other separator by using "delim = < separator >" .

Path where files are located

paths <- dir(getwd())

Names

names(paths) <- basename(paths)

Read and combine all the files

DataSet<-

ldply(paths, read_delim,delim = "|",escape_double = FALSE,
col_names = c("Col1","Col2","Col3","Col4","Col5"),
col_types = cols(Col1 = col_character(),
Col2 = col_date(format = "%d/%m/%Y")),
trim_ws = TRUE, skip = 6) # use "skip =6 to skip the first 6 rows".

system · December 7, 2021, 5:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.