Hi all.
Hoping someone here has some experience with "Service Unavailable/503" error codes.
I'm pretty sure this has something to do with my attempts to scrape the text, but I've tried to manually scrape some reviews on a company and got a similar error message, and now with the dedicated shell of functionality package that is Glassdoor specific in build, the "gdscrapeR" from the git repos.
Here is my code:
#install.packages("devtools")
#devtools::install_github("mguideng/gdscrapeR")
library(gdscrapeR)
library(rvest)
library(dplyr)
#gdscraper has a single function: "get_reviews"
?get_reviews
gd_appcast <- get_reviews(companyNum = "E1085934")
#Glassdoors' review page #'s are typically characters between "Reviews-" and ".htm" (usually starts with an 'E' and followed by up to seven digits).
...And here is the error msg:
"Number of web pages to scrape:
Show Traceback
Error in read_html.response(httr::GET(paste(baseurl, companyNum, sort, : Service Unavailable (HTTP 503).
15. stop(http_condition(x, "error", task = task, call = call))
14. httr::stop_for_status(x)
13. read_html.response(httr::GET(paste(baseurl, companyNum, sort, sep = "")))
12. xml2::read_html(httr::GET(paste(baseurl, companyNum, sort, sep = "")))
11. html_elements(...)
10. html_nodes(., ".tightVert.floatLt strong, .margRtSm.margBot.minor, .col-6.my-0 span")
9. xml_text(x, trim = trim)
8. html_text(.)
7. is.factor(x)
6. gsub("Found , reviews", "", .)
5. is.factor(x)
4. sub(",", "", .)
3. xml2::read_html(httr::GET(paste(baseurl, companyNum, sort, sep = ""))) %>%
html_nodes(".tightVert.floatLt strong, .margRtSm.margBot.minor, .col-6.my-0 span") %>%
html_text() %>% gsub("Found |,| reviews", "", .) %>% sub(",",
"", .) %>% as.integer()
2. get_maxResults(companyNum)
- get_reviews(companyNum = "E1085934")
"
Any thoughts?
Aside, I can C+P the url into a new browser window and it opens just fine....but running this code (AND a manual scrape using Rvest/Purrr et al) throws this or similar denial of service errors...