Why is purrr/possibly's otherwise not triggered?

I am scraping content from websites. For this I iterate over links. If an error occurs, purrr's possibly adverb should keep the process going, and place a "missing" (or "NA_character") as a result.

The code below works as intended when the site linked to is not existing, i.e. the output is "missing";
However, if the site linked to exists, but the element which I am trying to extract from the site does not exist, the function throws an error despite having defined a value for 'otherwise'.

To me this is surprising, since the documentation states that

' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'

Any idea why this is happening? I understand that i could modify the function accordingly (e.g. check for the length of the returned object). But I do not understand why the 'otherwise' value was not used.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#> 
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#> 
#>     guess_encoding

# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409

sample_data <- tibble::tibble(
  link = c(
    #link ok, selected item exists
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
    #link not ok
    "https://www.wrong-url.foobar",
    #link ok, selected item does not exist on site
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
    
           )
)


fn_get_link_to_records <- function(link_to_overview_sessions) {
  
print(link_to_overview_sessions)
    
link_to_overview_sessions %>% 
    rvest::read_html() %>% 
    rvest::html_elements("a") %>% 
    rvest::html_attr("href") %>% 
    enframe(name = NULL,
            value = "link_to_text") %>% 
    filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>% 
    mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>% 
    pull()
}


sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.

sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.

Created on 2021-03-28 by the reprex package (v1.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.