Link for input PDF's
https://drive.google.com/drive/folders/1dcgDpfiVjMTGmYSRGnQA65YjZzv0AwXL?usp=sharing
Code goes through all the PDF files in the path and creates a corpus and separates each line with a separator. Next it checks through all the lines with the given search list and pulls that line and tells if the search word is present in the PDF or not (a <- sapply(unlist(Table_search), grepl, x = tablelines)).
setwd("D:")
tables<- list.files(pattern='pdf
```)
tablecorpus <- Corpus(URISource(tables),
readerControl = list(reader=readPDF))
tospace <-content_transformer(function(x, pattern) gsub(pattern, " ",x))
tablecorpus <- tm_map(tablecorpus, tospace, "\r")
Table_Filenames <-DublinCore(tablecorpus,"id")
lapply(tables, function(x) strsplit(pdf_text(x), "\n")[[1]]) -> tablelines
tablelist <- unlist(tablelines) %>% str_split("\n")
Table_search <- list("Table 14", "Source Data:","VERSION")
a <- sapply(unlist(Table_search), grepl, x = tablelines)
I want the code to print the actual line where ever it finds the keyword in the PDF file like shown in image 2.