Hey there!
I´m trying to extract a specific textpart from an 10-K report using rm_between but can´t get the right pattern. Problem is that the title of the part is mentioned inside of other parts in the text so rm_between extracts wrong data. The edgar Package does have a command for it but I´d like to use rm_between.
Example:
Item 7. Management s Discussion and Analysis of Financial Condition and Results of Operations
Text I want to extract
Item 7A. Quantitative and Qualitative Disclosures About Market Risk
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
Short Version
You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.
Sorry for the inconvenience, I thought it was obvious from the example. Via the link it is possible to download a report completely, to show an excerpt would be confusing. I'll try to extract specifically the Management Discussion. The problem is that within the texts there are often references to this section and rm_between extracts multiple text passages with the help of the borders "Item 7. and Item 8. Attached is my code so far and the pattern used. https://seafile.zfn.uni-bremen.de/d/4c589adfd818423a930f/
Even using your initial example as you presented it I'm not seeing conceptually how you can split what you say you wish to split, apart from perhaps if the structure is interpretable via the linebreaks a la
library(stringr)
somexampletext <- "Item 7. Management s Discussion and Analysis of Financial Condition and Results of Operations
Text I want to extract
Item 7A. Quantitative and Qualitative Disclosures About Market Risk"
str_split_fixed(somexampletext,"\n",str_count(somexampletext,"\n"))[3]
I glanced at one of your .txt files, and it seems entirely unstructured though.