Using `purrr` map functions on unnamed vectors/lists?

Looking at the harrypotter package which consists of a list for each of the 7 HP books, I want to create a data frame with a column for Book, Chapter, Text.

Using a for loop you can do this by:

library(dplyr)
library(harrypotter)

titles <- c("Philosopher's Stone", "Chamber of Secrets", "Prisoner of Azkaban",
            "Goblet of Fire", "Order of the Phoenix", "Half-Blood Prince",
            "Deathly Hallows")

books <- list(philosophers_stone, chamber_of_secrets, prisoner_of_azkaban,
              goblet_of_fire, order_of_the_phoenix, half_blood_prince,
              deathly_hallows)

series <- tibble()

for(i in seq_along(titles)) {
  
  clean <- tibble(chapter = seq_along(books[[i]]),
                  text = books[[i]]) %>%
    mutate(book = titles[i]) %>%
    select(book, everything())
  
  series <- rbind(series, clean)
}

Is there a way to get the above with tibble or data.frame + map_chr()?

The problem I've been having in attempting to do this is that the character vectors and elements are unnamed so I don't have anything to pass as an argument into the purrr functions.

1 Like

I think, you are most of the way there.
This is what I did:

books <- setNames(books, titles)
res <- purrr::map(books, function(book){
    return(tibble::tibble(text = book, chapter = seq(1:length(book))))
}) %>%
    dplyr::bind_rows(.id = "Book")
str(res)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	200 obs. of  3 variables:
 $ book   : chr  "Philosopher's Stone" "Philosopher's Stone" "Philosopher's Stone" "Philosopher's Stone" ...
 $ text   : chr  "THE BOY WHO LIVED  Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfe"| __truncated__ "THE VANISHING GLASS  Nearly ten years had passed since the Dursleys had woken up to find their nephew on the "| __truncated__ "THE LETTERS FROM NO ONE  The escape of the Brazilian boa constrictor earned Harry his longest-ever punishment"| __truncated__ "THE KEEPER OF THE KEYS  BOOM. They knocked again. Dudley jerked awake. \"Where's the cannon?\" he said stupid"| __truncated__ ...
 $ chapter: int  1 2 3 4 5 6 7 8 9 10 ...

Is that what you wanted to achieve?

1 Like

Another alternative could be to use something like map2_df:

library(tidyverse)
map2_df(titles, books, ~ tibble(book = .x, chapter = seq_along(.y), text = .y))
#> # A tibble: 200 x 3
#>                   book chapter
#>                  <chr>   <int>
#>  1 Philosopher's Stone       1
#>  2 Philosopher's Stone       2
#>  3 Philosopher's Stone       3
#>  4 Philosopher's Stone       4
#>  5 Philosopher's Stone       5
#>  6 Philosopher's Stone       6
#>  7 Philosopher's Stone       7
#>  8 Philosopher's Stone       8
#>  9 Philosopher's Stone       9
#> 10 Philosopher's Stone      10
#> # ... with 190 more rows, and 1 more variables: text <chr>
2 Likes

Probably the simplest option is to add the books as a list column. Once you've done that, you can easily iterate over it with map and seq_along to make another list column of chapter numbers. Since they will be the same length, you can call tidyr::unnest afterwards to expand everything out.

library(tidyverse)
library(harrypotter)

books <- tibble(title = c("Philosopher's Stone", "Chamber of Secrets", "Prisoner of Azkaban",
                          "Goblet of Fire", "Order of the Phoenix", "Half-Blood Prince",
                          "Deathly Hallows"), 
                text = list(philosophers_stone, chamber_of_secrets, prisoner_of_azkaban,
                            goblet_of_fire, order_of_the_phoenix, half_blood_prince,
                            deathly_hallows), 
                chapter = map(text, seq_along))

books
#> # A tibble: 7 x 3
#>                  title       text    chapter
#>                  <chr>     <list>     <list>
#> 1  Philosopher's Stone <chr [17]> <int [17]>
#> 2   Chamber of Secrets <chr [19]> <int [19]>
#> 3  Prisoner of Azkaban <chr [22]> <int [22]>
#> 4       Goblet of Fire <chr [37]> <int [37]>
#> 5 Order of the Phoenix <chr [38]> <int [38]>
#> 6    Half-Blood Prince <chr [30]> <int [30]>
#> 7      Deathly Hallows <chr [37]> <int [37]>

Also, I've got to wonder about the legality of the package. gutenbergr is a good source of books in the public domain, if you need.

3 Likes

thanks, i've been trying to practice using purrr functions and this is a good example!

another good solution, thanks!

and yes you're quite right... i saw a certain someone use it in a text mining tutorial so i thought i'd play around with it too. for stuff you want to share online it's probably for the best to use public domain stuff from gutenbergr as you suggested!

this works too, i didn't know you could use the .id argument in bind_rows() so thanks!

purrr also has a function, map_dfr, for this common pattern of map() %>% bind_rows() and it takes the same .id argument.

1 Like

Glad it helped! I think the example is good to show how one of the variants of the purrr::map family of functions can work for this particular question.

I feel the approach illustrated by @alistaire's answer is more useful in a general sense though as getting used to working with nested tibbles / list columns that you tidyr::unnest at the end is something you can apply to a wide variety of situations...

1 Like