Hey there,
is there any way to use htmlParse with multiple html files? Got a folder with a bunch of files but I´m just getting the file names the moment I´m trying to use my list of files. The code Works fine on a single file.
files<-list.files(path = ".", recursive = TRUE,pattern = "\.html$", full.names = TRUE)
doc = htmlParse(filenames, asText=TRUE)
plain.text <- xpathSApply(doc, "//p", xmlValue)
plain.text<-gsub("Â|\n","",plain.text)
stri_remove_empty(plain.text, na_empty = FALSE)
cat(paste(plain.text, collapse = "\n"))
Result:
C:/Users/richard dobler/OneDrive/Desktop/38079/38079_10-Q_2006-05-10_0001104659-06-033149.htmlC:/Users/richard dobler/OneDrive/Desktop/38079/38079_10-Q_2006-08-09_0001104659-06-053129.htmlC:/Users/richard dobler/OneDrive/Desktop/38079/38079_10-Q_2006-11-09_0001104659-06-073607.html