Is there an easy way to extract an specific part from file names?
1800_10-Q_2012-05-08_0001104659-12-034444
I´m trying to extract the whole part from _ (0001104659-12-034444).
My code so far: str_extract(df$document, pattern = "(?<=\-)\d+(?=\.)")
Result: 034444
Can´t manage to get it to extract the whole number at the end. If I use "_" as start point it just gets me "NA"
Managed to get it to work:
str_extract(df$document,'([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][-][0-9][0-9][-])\d+(?=\.txt)')
Might not be the most elegant, but at least it´s working
Thanks a lot,
actually your regex gets me a result including .html (which I forgot to mention in my first answer). I´ll try to adjust it for my purpose
38079_10-Q_2006-05-10_0001104659-06-033149.html <- thats the whole part sorry my bad