RGIS
December 11, 2019, 7:43pm
1
I have a file that looks like this where someone in the original database added a key incorrectly with a return symbol. How can I read file in R that has a regular repeating LF or other line break in the wrong place? For the sake clarity, suppose I have the following reproduceable example:
example.txt =
ID|Event Date|Code Meaning|Code|Price \n
10020|2017-01-17|scotch tape|01|1.00 \n
10021|2017-02-17|rubber \n duck |02|3.99
10022|2018-03-17|rusty, nail 9000|03|5.99 \n
10023|2016-04-17|rubber \n duck |02|1.99 \n
10024|2015-02-17|scotch tape|01|2.00 \n
where every \n or character code "0x0A" is a line break. I would like to get a file that is like this....
ID| Date|Name|Code|Price \n
10020|2017-01-17|scotch tape|01|1.00 \n
10021|2017-01-17|rubber duck |02|3.99
10022|2017-01-17|rusty, nail 9000|03|5.99 \n
10023|2017-02-17|rubber duck |02| 1.99 \n
10024|2017-02-17|scotch tape|01|2.00 \n
so I can load in the data using something like tidyverse or readr so looks like this...
ID =c(10020,10021,10022,10023,10024)
Date=c("2017-01-17","2017-01-17","2017-01-17","2018-02-17","2017-01-17")
Name=c("scotch tape","rubber duck",rusty, nail 9000","rubber duck","scotch tape")
Price=c(1.00,3.99,5.99,1.99,2.00)
mydata <- data.frame(ID,Date,Name,Price)
View(mydata)
How can I do this in R?
FJCC
December 11, 2019, 9:04pm
2
There is probably some cool function out there that does this but I wrote a little code to do it. It might be painfully slow for a large data set.
library(stringr)
DF <- readLines("/home/fjcc/R/Play/Dummy.csv")
j <- 1
i <- 1
OutVec <- vector("character", length = length(DF))
tmp <- ""
while (DF[i] != "") {
TestStr <- paste0(tmp, DF[i])
if (str_count(TestStr, "\\|") == 4) {
OutVec[j] <- TestStr
tmp <- ""
j <- j + 1
} else {
tmp <- TestStr
}
i <- i + 1
}
OutVec <- OutVec[OutVec != ""]
writeLines(OutVec, "/home/fjcc/R/Play/FixedFile.txt")
Fixed <- read.table("/home/fjcc/R/Play/FixedFile.txt", sep ="|", header = TRUE)
Fixed
#> ID Event.Date Code.Meaning Code Price
#> 1 10020 2017-01-17 scotch tape 1 1.00
#> 2 10021 2017-02-17 rubber duck 2 3.99
#> 3 10022 2018-03-17 rusty, nail 9000 3 5.99
#> 4 10023 2016-04-17 rubber duck 2 1.99
#> 5 10024 2015-02-17 scotch tape 1 2.00
Created on 2019-12-11 by the reprex package (v0.2.1)
2 Likes
Another approach could be to read the file as text and remove the unwanted characters with a regular expression, see this example (the regex might need fine-tuning for your actual application)
library(tidyverse)
raw_text <- read_file("example.txt")
raw_text <- str_remove_all(raw_text, "\\s\\\\n(?=\\s)")
df <- read_delim(raw_text, delim = "|")
df
#> # A tibble: 5 x 5
#> ID `Event Date` `Code Meaning` Code Price
#> <dbl> <date> <chr> <chr> <dbl>
#> 1 10020 2017-01-17 scotch tape 01 1
#> 2 10021 2017-02-17 rubber duck 02 3.99
#> 3 10022 2018-03-17 rusty, nail 9000 03 5.99
#> 4 10023 2016-04-17 rubber duck 02 1.99
#> 5 10024 2015-02-17 scotch tape 01 2
2 Likes
system
Closed
January 1, 2020, 9:52pm
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.