Extract strings

Peter_Griffin · October 29, 2019, 2:19am

strings <- c("/run/media/bb/cc/GA/DrRao/JOBS/Edgar filings_full text/Form S-1/10/10_S-1_2013-11-20_0001104659-13-086087.txt", "/run/media/bb/cc/GA/DrRao/JOBS/Edgar filings_full text/Form S-1/1001172/1001172_S-1_2013-01-20_0001104659-13-086087.txt")

I need to extract the number after S-1 between the first / and the second /, which are 10 and 101172, how could I achieve this? Thanks!

FJCC · October 29, 2019, 2:37am

You can use a regular expression with a look-behind assertion, which has the form (?<=...). That means "look for text that follows what is in the place of the three dots".

library(stringr)
#> Warning: package 'stringr' was built under R version 3.5.3
strings <- c("/run/media/bb/cc/GA/DrRao/JOBS/Edgar filings_full text/Form S-1/10/10_S-1_2013-11-20_0001104659-13-086087.txt", "/run/media/bb/cc/GA/DrRao/JOBS/Edgar filings_full text/Form S-1/1001172/1001172_S-1_2013-01-20_0001104659-13-086087.txt")
numbers <- str_extract(strings,"(?<=S-1/)\\d+")
numbers
#> [1] "10"      "1001172"

^{Created on 2019-10-28 by the reprex package (v0.3.0.9000)}

Peter_Griffin · October 29, 2019, 2:44am

Are there any tutorials on understanding the regular expressions?

andresrcs · October 29, 2019, 2:46am

There are lots of resources online but if you are looking for a book, I would recommend this one

https://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124

Peter_Griffin · October 29, 2019, 2:47am

Thanks. I think I will stick with the stringr cheat sheet.

andresrcs · October 29, 2019, 2:48am

This article is more specific on that regard but it is not meant to be a regex tutorial

https://stringr.tidyverse.org/articles/regular-expressions.html

nwerth · October 29, 2019, 1:45pm

You can get in some practice with Regex Golf. It's a common game (the link is just one person's version) with the goal of writing a regular expression to match everything in one list and nothing in another. It's scored by the number of characters in the expression, so it's like golf in that lower scores are better.

system · November 5, 2019, 1:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.