Service Unavailable (HTTP 503) message with the "gdscrapeR" pkg...

BodhiSeeds · July 29, 2022, 5:51pm

Hi all.
Hoping someone here has some experience with "Service Unavailable/503" error codes.
I'm pretty sure this has something to do with my attempts to scrape the text, but I've tried to manually scrape some reviews on a company and got a similar error message, and now with the dedicated shell of functionality package that is Glassdoor specific in build, the "gdscrapeR" from the git repos.

Here is my code:

#install.packages("devtools")
#devtools::install_github("mguideng/gdscrapeR")
library(gdscrapeR)
library(rvest)
library(dplyr)

#gdscraper has a single function: "get_reviews"
?get_reviews
gd_appcast <- get_reviews(companyNum = "E1085934")
#Glassdoors' review page #'s are typically characters between "Reviews-" and ".htm" (usually starts with an 'E' and followed by up to seven digits).

...And here is the error msg:

"Number of web pages to scrape:
Show Traceback
Error in read_html.response(httr::GET(paste(baseurl, companyNum, sort, : Service Unavailable (HTTP 503).
15. stop(http_condition(x, "error", task = task, call = call))
14. httr::stop_for_status(x)
13. read_html.response(httr::GET(paste(baseurl, companyNum, sort, sep = "")))
12. xml2::read_html(httr::GET(paste(baseurl, companyNum, sort, sep = "")))
11. html_elements(...)
10. html_nodes(., ".tightVert.floatLt strong, .margRtSm.margBot.minor, .col-6.my-0 span")
9. xml_text(x, trim = trim)
8. html_text(.)
7. is.factor(x)
6. gsub("Found , reviews", "", .)
5. is.factor(x)
4. sub(",", "", .)
3. xml2::read_html(httr::GET(paste(baseurl, companyNum, sort, sep = ""))) %>%
html_nodes(".tightVert.floatLt strong, .margRtSm.margBot.minor, .col-6.my-0 span") %>%
html_text() %>% gsub("Found |,| reviews", "", .) %>% sub(",",
"", .) %>% as.integer()
2. get_maxResults(companyNum)

get_reviews(companyNum = "E1085934")
"

Any thoughts?
Aside, I can C+P the url into a new browser window and it opens just fine....but running this code (AND a manual scrape using Rvest/Purrr et al) throws this or similar denial of service errors...

system · August 19, 2022, 5:51pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.