I have a list of pages with links to the datasets, and I would like to collect them using their XPATHs. The problem is that the function doesn''t capture the whole link. For example, if the link is https://nabu.gov.ua/sites/default/files/page_uploads/21.06/mizhnarodni_dogovory_2022_dlya_rozmishchennya.xls, it only captures "URL: https://nabu.gov.ua/sites/default/files/page_uploads/21.06/mizhnarodni_dogovory_20 ...", and not the whole link. What should I do in order to fix this?
The code:
dataframes_links = list()
for (el in datasets){
source <- readLines(el, encoding = "UTF-8")
parsed_doc <- htmlParse(source, encoding = "UTF-8")
dataframes_links <- append(dataframes_links, list(xpathSApply(parsed_doc, path = '/html/body/div[2]/div[2]/div/div[2]/div[5]/div/p[2]/a', xmlValue)))
}
The warnings:
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/05e35ad5-a164-44c5-8295-e66350aa6e23'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/71378f9e-a75f-4cab-bc46-dcdf1f11c495'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/a5409863-b163-4a3f-b561-e8c8c54e9095'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/4c4b690d-ca17-4cb5-b16e-1998f4fed9a9'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/b41369a5-74dc-4395-aa70-82a416def821'”
The element: