Is it possible to not generate the empty rows (3, 5, 8) in the below call to separate_rows? It's easy to clean up after with a filter() but I'd prefer to just not get these rows in the first place. I could imagine solving this problem with a tidyr::extract_rows, which does not yet exist. Any way to solve it with the current tidy API?
I think this is just a consequence of regex match then split. As you ask for splitting by ; and you have one at the end of each character, the result is an empty character after the split.
the internal split function behind separate_rows is stringi::stri_split_regex that have an omit_empty argument FALSE by default.
As you ask for splitting by ; and you have one at the end of each character, the result is an empty character after the split.
Yeah I guess I should have included that I also tried (\\(contact\\);$|\\(contact\\);|;$|;) in my hopes of swallowing the end-of-line and not getting that empty character after the split, but that does not seem to work. It's surprising to me that, if the separator includes $, there is still an empty character after the split.
the internal split function behind separate_rows is stringi::stri_split_regex
Thanks for pointing me to the underlying function call; when I used F2 to navigate to the tidyr::separate_rows implementation I just got UseMethod("separate_rows") and decided to give up on code navigation and just ask a question.
Unfortunately none of the three workarounds proposed are satisfactory so I guess I'll file some upstream issues.
you could add optional space in regex \\s?
Indeed I swallow whitespace in my code, but I tried to simplify the reprex to focus on the trailing separator issue.