Got myself into a situation where I need to read in some text files that happen to have NUL chars embedded within them. My favorite readr function, read_file()
, interprets NUL as the end-of-file stops reading upon encountering the first NUL.
These are my source data files, and I prefer to leave them exactly as they are (NUL's and everything)-- so I can't just strip it out of these files, save them, and then process them (plus, there's a lot of them).
Since I can't use readr::read_file()
, I see that read_file()
has a lesser known cousin, read_file_raw()
. It ALMOST works...
What I got here is a tibble, logs
, that has the full filepath in file
. I want to put the contents of each file into the content
column as a string.
logs <- logs |>
mutate(content = map(file, ~read_file_raw(.))) |>
unnest(cols = c('content')) |>
mutate(content = rawToChar(content))
This appears to read in the WHOLE file. That's great. The bad news is that I don't see how to strip out the NUL chars. I had hoped that rawToChar()
would just do it or provide an option to strip out NUL's, but no...
Error in `mutate()`:
ℹ In argument: `content = rawToChar(content)`.
Caused by error in `rawToChar()`:
! embedded nul in string:
Goggling around I see some base R stuff that can take a binary and replace NUL (\0's).
Unfortunately, I've deliberately forgotten most of base R and only want to use tidyverse. The base R examples are stuff like r[r!=as.raw(0)]
. Don't see how to incorporate that in my dplyr stanza Is there an "easy button" to fix this?