I have a simple {targets}
pipeline where I read the content of a webpage:
list(targets::tar_target(
html_content, rvest::read_html("https://www.worldatlas.com/na/us/area-codes.html")
))
When I run the pipeline (i.e. targets::tar_make()
) and check the content of the html_content
target, this is what I get:
> tar_read(html_content)
$node
<pointer: (nil)>
$doc
<pointer: (nil)>
attr(,"class")
[1] "xml_document" "xml_node"
Odd! However, I am able to read the webpage when I run the rvest::read_html(...)
code in the console. I dug around a little bit and learned that the function does not do well in "saved environments" (source).
Now, speaking of saved environments, my entire Linux (Pop OS) configuration is managed by the reproducibility powerhouse Nix. It is not very well known in useRs' circles, but is becoming more and more known thanks to the efforts of Bruno Rodrigues and co. with their great rix package (an R package which creates Nix-based local reproducible environments for R projects). I am no longer using the {rix}
package, but spent some time learning how Nix works and am using it directly. I suspect that this constitutes a "saved environment", which causes problems for the rvest::read_html()
function.
Does anyone know how I can resolve this issue?
Also @wlandau apologies for tagging you.
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS
> packageVersion("rvest")
[1] '1.0.4'
> packageVersion("targets")
[1] '1.7.1'