Text input with additional symbol

evelinesurbakti · December 5, 2020, 10:41pm

Hi there,
I want to scrape a web link with rvest and show the results with shiny.

I have this in my ui part:
numericInput("end", "End Page", value = 100, min=100, max=1000, step = 100),
textInput("Initial_page", "Link")

start <- 10  # where the page starts
end <-    # last page, depends on numeric input
links <- seq(start, end, by = 10) # it will return 10, 20, ... , 500

Make an empty dataframe to store the data
data <- data.frame()

Let's loop!
we will process the links, one by one, that's why I used seq_along function

for(i in **seq_along(links)**) {    
  Initial_page <- "https://linkdotblabla-" #  should be the text input plus " symbols
  url <- paste0(Initial_page, "&start=", links[i]) # construct the url by pasting
  page <- xml2::read_html(url) # read the html

my problem is, I do not know:

How to "infuse" link into the for loop and make my seq_along function works.
How to make the rvest works.

here is the fullcode:

# Specifying the url 
start <- 10  # where the page starts
end <- 1000   # last page, depends on how many data that you want
links <- seq(start, end, by = 10) # it will return 10, 20, ... , 500

Alright, we loop the links and now we need to store the result into a dataframe.

# Make an empty dataframe to store the data
data <- data.frame() 

# Let's loop!
# we will process the links, one by one, that's why I used seq_along function
for(i in seq_along(links)) {    
  Initial_page <- "https://ie.indeed.com/jobs?q=analyst&l=Ireland" # the very first page 
  url <- paste0(Initial_page, "&start=", links[i]) # construct the url by pasting
  page <- xml2::read_html(url) # read the html
  
  # Sys.sleep pauses R for two seconds to avoid the error message
  Sys.sleep(2)

  # right-click on page - inspect and you can use CSS Selector addins on Chrome
  # get the job title
  job_title <- page %>%
    rvest::html_nodes("div") %>%
    rvest::html_nodes(xpath = '//a[@data-tn-element = "jobTitle"]') %>%
    rvest::html_attr("title")

  # get job location CSS selector
  job_location <- page %>%
    rvest::html_nodes('.accessible-contrast-color-location') %>%
    rvest::html_text() %>%
    stringi::stri_trim_both()

  # get the company name
  company_name <- page %>%
    rvest::html_nodes("span")  %>%
    rvest::html_nodes(xpath = '//*[@class="company"]')  %>%
    rvest::html_text() %>%
    stringi::stri_trim_both() -> company.name

  # get job description CSS selector
  job_description <- page %>%
    rvest::html_nodes('.summary') %>%
    rvest::html_text() %>%
    stringi::stri_trim_both()

  df <- data.frame(job_title, job_location, company_name, job_description)
  data <- rbind(data, df)
}

Next, I only have an interest to find a job in Dublin. Then I subset the data with unique location = Dublin.

# New Dublin Data set 
df_IE <- data %>%
  dplyr::distinct() %>%
  dplyr::mutate(city = "Dublin") # add column city = Dublin

# Cleaning
df_IE$job_description <- gsub("[\r\n]", "", df_IE$job_description)

# in case you want to save the dataset into a csv
write.csv(df_IE,"df_IE.csv")

Please let me know if you have any idea, thanks!

system · January 29, 2021, 2:41am

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.