How to extract block of the data with different structures

Rozgar · September 17, 2020, 2:02pm

How to extract different blocks of data (including items) from Source.txt and paste it in the Target.txt after the place_for_block:.

Source.txt look like:
block1
item 1
item2

block2
item 1
item2
item3
item 4

block3
item 4

Target.txt looks like:

text text text text
text text
place_for_block1:

text text text text
text texttext text text text
text text
place_for_block2:

text text text text
place_for_block3:
text text text text

Thanks for your time

ChrisL · September 19, 2020, 10:02am

Here is a starting point. The idea is pretty simple: find the matching parts in source.txt and target.txt and stitch them together in pairs.

library(magrittr) # for the pipe %>% 
# pretending we've read the text files with base::readLines
sourceTxt <- "block1
b1 item 1
b1 item2

block2
b2 item 1
b2 item2
b2 item3
b2 item 4

block3
b3 item 4" %>% strsplit("\n") %>% .[[1]]

targetTxt<-"0text text text text
0text text
place_for_block1:

1text text text text
1text texttext text text text
1text text
place_for_block2:

2text text text text
place_for_block3:
3text text text text" %>% strsplit("\n") %>% .[[1]]

# get the indices of the text fragments
blockStart <- which(grepl("^block([0-9]+)$", sourceTxt))
targetLocation <- which(grepl("^place_for_block([0-9]+):$", targetTxt))

# sanity check
stopifnot(identical(paste0("place_for_", sourceTxt[blockStart],":"), targetTxt[targetLocation]))

# adding the edge-cases
targetLocation <- c(0, targetLocation)
blockStart <- c(blockStart, length(sourceTxt) + 1)

# stitch all fragments together
final <- purrr::pmap(list(
  head(targetLocation, -1) + 1, # beginning and end of the target's fragments
  tail(targetLocation, -1),
  head(blockStart, -1),         ## beginning and end of the source's fragments 
  tail(blockStart, -1) - 1),
  function(t1, t2, s1, s2){
    c(targetTxt[t1:t2], sourceTxt[(s1+1):s2])
    }
  ) %>% unlist

# add the remainder of targetTxt if there's something after place_for_last_block
if(tail(targetLocation, 1) != length(targetTxt)){
  final <- c(final, targetTxt[(tail(targetLocation, 1) + 1):length(targetTxt)])
}

cat(final, sep="\n")

Results:

0text text text text
0text text
place_for_block1:
b1 item 1
b1 item2


1text text text text
1text texttext text text text
1text text
place_for_block2:
b2 item 1
b2 item2
b2 item3
b2 item 4


2text text text text
place_for_block3:
b3 item 4
3text text text text

Rozgar · September 21, 2020, 1:07pm

Thanks ChrisL, gerat help.

Bests

system · September 28, 2020, 1:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.