Knitr intermittently fails to execute readLines on URLs, throws Error in file(con, "r") : cannot open the connection to <URL>

de1pher · January 22, 2022, 10:51pm

Hi all,

I ran into a strange issue where knitr seems to intermittently throw errors when running readLines on a URL. The below line works in console every time, but it seems to fail 4 out of 5 times when I run it in knitr:

readLines("https://www.reddit.com/r/sports/top.json?t=month&limit=100", warn = FALSE)

And here is the full knitr file:

---
title: "testing"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


```{r run, include=TRUE}
x <- readLines("https://www.reddit.com/r/sports/top.json?t=month&limit=100", warn = FALSE)
```

To add mystery to this problem, I noticed that some URLs tend to do better than others, e.g. if I use the following URL: https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv (iris dataset) then it seems to work every time, whereas reddit does not. Perhaps is has something to do with the response time from the server.

Has anyone encountered this problem? Any ideas how to fix this?

Many thanks in advance

technocrat · January 23, 2022, 7:17am

Some sites (I don't know if reddit is one) throttle requests to, say, 10/second. But that's unlikely to be the case with a knitrd document. Other websites can be picky about user agents.

That said, rather than readLines, you would be far better off with

library(jsonlite)
x <- read_json("https://www.reddit.com/r/sports/top.json?t=month&limit=100")

de1pher · January 23, 2022, 8:34am

Thanks for the suggestion! I'm aware of throttling rules, but I don't think it matters here because I'm only running a single call with knitr. As for jsonlite, I haven't tested it but I'm using readLines indirectly through a package, it's just that I've narrowed down the issue to readLines specifically.

cderv · January 25, 2022, 10:25am

Is it narrowed to readLines() and so url() function in base R or is it also proven to be related to knitr ? Does it happen only when knitting ?

I don't quite see how knitr is related to this issue. It will only call the function in the chunk.

de1pher · January 25, 2022, 7:49pm

It appears to be related to knitr which puzzles me. As I explained, running the same readLines command in the console works fine, whereas when I'm knitting a document, I'm getting an error most of the time (but not always).

cderv · January 27, 2022, 9:09am

Unfortunately I can't reproduce and I am not sure to see how knitr could be related.

I'll remember this if I encounter it and we'll see if others have the same issue.

de1pher · January 27, 2022, 10:20pm

This issue does not seem to be limited to my machine only, in fact, it was raised by a user of my package (more info here), but if it seems to work for you, then I'm not entirely sure what to make of this. To be honest, I also don't see how could it possibly be related to knitr, but that's the behaviour that I'm observing.

system · February 17, 2022, 10:21pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.