Convert Data frame into a clean list

Hello R lovers,

I need you help because I'm trying to create XML sitemaps from URLS crawled.
However I'm stuck when I need to convert dataframe into list ... let me explain :

1st step : I crawl my website
2nd step : I create data frames with urls scraped ("sitemap_#")

sitemap_3

3rd step : I create XML sitemaps (I found the following code on the web). In this extract it is working well but urls are in list.

require(whisker)
require(httr)
tpl <- '
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 {{#links}}
   <url>
      <loc>{{{loc}}}</loc>
      <lastmod>{{{lastmod}}}</lastmod>
      <changefreq>{{{changefreq}}}</changefreq>
      <priority>{{{priority}}}</priority>
   </url>
 {{/links}}
</urlset>
'

links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")

map_links <- function(l) {
  tmp <- GET(l)
  d <- tmp$headers[['last-modified']]
  
  list(loc=l,
       lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
       changefreq="monthly",
       priority="0.8")
}

links <- lapply(links, map_links)

sitemap_R

cat(whisker.render(tpl))

As you can see with the screenshot, It works well. When I try to replace

links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")

by

links = as.character(sitemap_3)

I have a format that is not compliant (see 2nd & 3rd screenshot) then it does not work :confused:

sitemap_error

Can somebody helps me here ?

Thanks :pray:

we don't have sight of sitemap_3 to understand how it might be malformed / causing you problems

Hi @nirgrahamuk
I add a screenshot in my post :wink:

Thanks

links = as.character(sitemap_3$Url)

Thanks for your reply ! It is working now :wink:
However I got a new error related to date format :

 Error in as.Date.default(d, format = "%a, %d %b %Y %H:%M:%S") : 
  incapable de convertir 'd' dans la classe “Date”```

Any idea how to fix it ?

maybe add a cat() to print to you the contents of d before that function call, so you can observe it, or use browser() to browse in.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.