Im want to obtain the different links of this 3420 items. Im not have strong experience with loops or functions about this. Im can make the script for download the 10 items for the first page. But is very time consuming make one by one in each page.
The idea is obtain each link. All links have this form: handle/10568/43833 . For each item change the final number. In link2 with paste0 I get the final link of each item.
It looks like you are trying to extract a list of links like those in pag$link2, but all of them, rather than the first 10. Is the problem that the website only displays 10 results at a time and expects a user to click next or similar to go on to the next 10?
It may be possible to do this depending on how the source website paginates. Do the pages have separate urls numbered sequentially? If so, it’s just an outside loop for the pages and inner loops for the links
You could do this (more efficient I think because the website is accessed less) also in the following way.
The function handles one page and the loop over the 10 nodes is avoided by using xml_find_all .
I use package xml2 because I am not familiar with rvest but I think they are more or less the same (??)
This script is more efficient like you said. Now Im try to obtain the names (Title) of each link for check if the link correspond.
So, In my code Im add this, but get error:
Title[i]<-website %>%
read_html() %>%
html_nodes(xpath=paste0('//*
[@id="resultsTable"]/tbody/tr[',i,']/td/div/div[1]/a/span')) %>%
html_text(trim = T)
# Error in Title[i] <- website %>% read_html() %>% html_nodes(xpath = #paste0("//*[@id=\"resultsTable\"]/tbody/tr[", :
# replacement has length zero
# For example get something like this
# Title link2
#1 Industrializacion de la yuca https://cgspace.cgiar.org/handle/10568/71370
#2 Development and use of biotechnology ....... https://cgspace.cgiar.org/handle/10568/55409
Hi @andresrcs , for better find help, Im use this two forum. Because Im see for example that many people dont now https://forum.posit.co. When the response is in other site Im copy and share the response for both communities. The idea share the knowledge. and mark the correct answer so that when someone has a similar problem they can have a quick solution and not kill their heads trying to solve it.
I am a user who has learned almost everything in a self-taught way and I have found too much help in the forums. I am very impressed with the knowledge that many people have about R.