I ran into a strange issue where knitr seems to intermittently throw errors when running readLines on a URL. The below line works in console every time, but it seems to fail 4 out of 5 times when I run it in knitr:
To add mystery to this problem, I noticed that some URLs tend to do better than others, e.g. if I use the following URL: https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv (iris dataset) then it seems to work every time, whereas reddit does not. Perhaps is has something to do with the response time from the server.
Has anyone encountered this problem? Any ideas how to fix this?
Some sites (I don't know if reddit is one) throttle requests to, say, 10/second. But that's unlikely to be the case with a knitrd document. Other websites can be picky about user agents.
That said, rather than readLines, you would be far better off with
library(jsonlite)
x <- read_json("https://www.reddit.com/r/sports/top.json?t=month&limit=100")
Thanks for the suggestion! I'm aware of throttling rules, but I don't think it matters here because I'm only running a single call with knitr. As for jsonlite, I haven't tested it but I'm using readLines indirectly through a package, it's just that I've narrowed down the issue to readLines specifically.
It appears to be related to knitr which puzzles me. As I explained, running the same readLines command in the console works fine, whereas when I'm knitting a document, I'm getting an error most of the time (but not always).
This issue does not seem to be limited to my machine only, in fact, it was raised by a user of my package (more info here), but if it seems to work for you, then I'm not entirely sure what to make of this. To be honest, I also don't see how could it possibly be related to knitr, but that's the behaviour that I'm observing.