Hey Guys!
I´m trying to extract parts of text between different heading. The Headings are starting with "Item 1-15. Title".
I started finding the matching pattern to get the "item" part: str_extract_all(a, "(Item\s\d+\.[:blank:])".
Just can´t get it to extract the whole text between those headings.
is this HTML ? as the text of interest seems to be a different font style than the other, I would probably use the associated tags to get to the content rather than treating it as a singular text to cut up with regular expressions
Yes, it´s an HTML file. I convertet it to the text format using htm2txt in R. I´m now trying to seperate all the text blocks and sort them by title after. Still need to extract the text blocks first I guess.