XML: xpathSApply function doesn't capture the whole string (URL)

I have a list of pages with links to the datasets, and I would like to collect them using their XPATHs. The problem is that the function doesn''t capture the whole link. For example, if the link is https://nabu.gov.ua/sites/default/files/page_uploads/21.06/mizhnarodni_dogovory_2022_dlya_rozmishchennya.xls, it only captures "URL: https://nabu.gov.ua/sites/default/files/page_uploads/21.06/mizhnarodni_dogovory_20 ...", and not the whole link. What should I do in order to fix this?

The code:

dataframes_links = list()
for (el in datasets){
    source <- readLines(el, encoding = "UTF-8")
    parsed_doc <- htmlParse(source, encoding = "UTF-8")
    dataframes_links <- append(dataframes_links, list(xpathSApply(parsed_doc, path = '/html/body/div[2]/div[2]/div/div[2]/div[5]/div/p[2]/a', xmlValue)))
}

The warnings:

Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/05e35ad5-a164-44c5-8295-e66350aa6e23'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/71378f9e-a75f-4cab-bc46-dcdf1f11c495'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/a5409863-b163-4a3f-b561-e8c8c54e9095'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/4c4b690d-ca17-4cb5-b16e-1998f4fed9a9'”
Warning message in readLines(el, encoding = "UTF-8"):
“incomplete final line found on 'https://data.gov.ua/dataset/5885447/resource/b41369a5-74dc-4395-aa70-82a416def821'”

The element:

I would try to retrieve the href attribute. That must be complete, while the description ...

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.