How to keep running map after an error?

Hi there! :wave:t3:
I am trying and iterative function with purrr.
This is a tibble with 2 columns. First col is a character, the second column is a list-column of one tibble with a vector character. I am adding a third list-column that is a new tibble.

To add the the third column I use a mutate and map on that column. It stops every time it finds and error. The error is a not very clear to me. I will be happy if it will return an empty row if the function doesn't get any results.

The Error is:

Error: Problem with mutate() column doi_data1.
:information_source: doi_data1 = purrr::map(.x = data, .f = get_data_from_doi).
x nms %in% c("i", "x", "") are not all TRUE

the input is nested_doi

nested_doi <- tibble(
orcid = c("0000-0003-3685-174X"), 
data =  list( tibble(doi = c("10.1130/0016-7606(1997)109<1515:fositc>2.3.co", 
"10.1130/0016-7606(1997)109<1515:fositc>2.3.co;2", 
"10.1175/1087-3562(1999)003<0002:tdmcwc>2.0.co;2")) ))

The function I am using comes with package roadoi::oadoi_fetch I am wrapping it to work with the tibble

get_data_from_doi <- function(tibble_dois){
  test <- tibble_dois %>%
    mutate(doi_data = purrr::pmap(.l = list(
      dois = doi,
      email = "",  ### PLEASE add your email here
      .flatten = TRUE),
      .f = roadoi::oadoi_fetch))
  
    return(test)
}

Then I want to run this function for every row in the first tibble which has a number of dois in the example just 3, but i got some of those up to 400 dois

 test <- nested_dois %>% 
              mutate(doi_data1 = purrr::map(.x = data,
                                .f = get_data_from_doi))

then I would need to open that new columns doi_data1 and filter from there, that's another story, not all those tibbles have the same columns.

Could someone point me to a better way of doing this? and one that won't stop when errors occur? Please

You can wrap the function in purrr::safely() or purrr::possibly() (depending on what behavior you want). See examples in documentation: Capture side effects. — safely • purrr .

2 Likes

Thank you,
I've added safely and possibly around the map, but i get the same error :cry:
I've been trying a few times so I know which rows are failing (of the first ones) hence why selecting 10 here:

nested_dois[10, ] %>%
mutate(doi_data1 = purrr::possibly(
                                  purrr::map(.x = data,
                                .f = get_data_from_doi),
                            otherwise = NA)
              )

Error: Problem with mutate() column doi_data1.
:information_source: doi_data1 = purrr::safely(...).
x nms %in% c("i", "x", "") are not all TRUE
Run rlang::last_error() to see where the error occurred.

purrr::possibly() should be wrapping the function you are passing through map().

For example:

nested_doi[10,] %>%
  mutate(doi_data1 = 
    purrr::map(.x = data,
               .f = purrr::possibly(get_data_from_doi, otherwise = NA)
               )
    )
1 Like

the data structure seems overcomplicated perhaps.. perhaps unnest and process a flat df , you could nest again after if its a big advantage ?

@brshallo hanks Brian! :partying_face: :pray:t4:
You pointed to the right function i needed.
I've added it to the function, in order to catch the DOI that made it failed.
If your example the whole list of DOIs gets NA otherwise.

get_data_from_doi <- function(tibble_dois){
  test <- tibble_dois %>%
    mutate(doi_data = purrr::pmap(.l = list(
      dois = doi,
      email = "", ## your email here
      .flatten = TRUE),
      .f = purrr::possibly(roadoi::oadoi_fetch, ## the saving function! 
                           otherwise = NA))) #nreturn NA if no data is found
  
    return(test)
}

@nirgrahamuk
I've tried running it flat too. But when it failed i didn't know where it failed failed. With the nest i could get to the the orcid row it made it failed and then go and look at the dois. I've identified a few rows which failed.

Now that it works adding the possibly I need to try the best way to filter all those list-columns, and you are right, maybe I am over complicating the structure, it is work in progress :slight_smile: i'll keep testing.

thank you both

There are a few ways you might find where it errored. I wrote a tweet on this subject for when using the safely() approach: https://twitter.com/brshallo/status/1370607500664389633?s=20

You could also add a print statement to your function where the NA's are if you want to preview them. E.g. something like:

get_data_from_doi <- function(tibble_dois){
  test <- tibble_dois %>%
    mutate(doi_data = purrr::pmap(.l = list(
      dois = doi,
      email = "", ## your email here
      .flatten = TRUE),
      .f = purrr::possibly(roadoi::oadoi_fetch, ## the saving function! 
                           otherwise = NA))) #nreturn NA if no data is found
  
  message("Rows where error:")
  print(
    filter(test, map_lgl(doi_data, is.na))
  )
  message("-----")
  
  return(test)
}

Explaining the filter step:

I'm using map_lgl() because then will output a logical atomic vector which filter() can handle rather than a list. Also note that mutate(test, fail = map_lgl(doi_data, is.na)) %>% filter(fail) accomplishes the same thing -- I'm just avoiding creating a new column by going straight to using filter().

Side note: If the problem requires me being very careful, I'll often use safely() over possibly() as there could be other things about your function that cause it to output an NA( or w/e value you set for otherwise) other than an error.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.