Hello R lovers,
I need you help because I'm trying to create XML sitemaps from URLS crawled.
However I'm stuck when I need to convert dataframe into list ... let me explain :
1st step : I crawl my website
2nd step : I create data frames with urls scraped ("sitemap_#")
3rd step : I create XML sitemaps (I found the following code on the web). In this extract it is working well but urls are in list.
require(whisker)
require(httr)
tpl <- '
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{{#links}}
<url>
<loc>{{{loc}}}</loc>
<lastmod>{{{lastmod}}}</lastmod>
<changefreq>{{{changefreq}}}</changefreq>
<priority>{{{priority}}}</priority>
</url>
{{/links}}
</urlset>
'
links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")
map_links <- function(l) {
tmp <- GET(l)
d <- tmp$headers[['last-modified']]
list(loc=l,
lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
changefreq="monthly",
priority="0.8")
}
links <- lapply(links, map_links)
cat(whisker.render(tpl))
As you can see with the screenshot, It works well. When I try to replace
links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")
by
links = as.character(sitemap_3)
I have a format that is not compliant (see 2nd & 3rd screenshot) then it does not work
Can somebody helps me here ?
Thanks