Extract single list element as part of a "pipeline"

dplyr
purrr

#1

Hi, I’m interested in extracting a list element as part of a pipeline.

Using a simple (maybe too simple) list, such as list("a", "b", "c"), how could I extract only the first list element?

I tried this, but it does not work (perhaps obviously so):

library(tidyverse, warn.conflicts = TRUE)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
list("a", "b", "c") %>% 
    map(~ function(x) x[[1]])
#> [[1]]
#> function (x) 
#> x[[1]]
#> <environment: 0x7ff075977d08>
#> 
#> [[2]]
#> function (x) 
#> x[[1]]
#> <environment: 0x7ff075977188>
#> 
#> [[3]]
#> function (x) 
#> x[[1]]
#> <environment: 0x7ff0759767c8>

Choosing between this site and StackOverflow for posting a question
#2

Do you mean this?

library(magrittr)
list("a", "b", "c") %>% extract2(1)
# [1] "a"

#3

Yes, thank you! I had tried the solution presented in the select answer to this question, (involving %$% and which I think no longer works). It looks likeextract2()` was in another answer.


#4

As an aside - any idea why library() still warned me about conflicts when I asked for the warnings not to be printed)?


#5

The README mentions this as a feature of the package startup messages, apparently separate from what library regards as a conflict warning.

Unfortunately, I don’t see any option to suppress it there. You can always suppressMessages(library(tidyverse)) if you’re willing to forgo all messages.


#6

extract2 is a good way to do it in a pipe generally. The problem is that the regular extract function in magrittr conflicts with the one in tidyr, so I generally don’t bother with the magrittr longhand. Instead, I often just use the [[ function directly:

list("a", "b", "c") %>%
  `[[`(1)

It’s messier, but for my own purposes I can read it.

On a separate note, while map isn’t really the right tool in this case, if you had a list of sublists and wanted the first item of each sublist, the way I think you were going for would be map(~ .x[[1]]). The ~ eliminates the need for the function (x) longhand.


#7

Hi @jmichaelrosenberg , with the tidyverse, you can also use purrr::simplify() or purrr:as_vector(), here is an example:

> library(purrr)
> library(dplyr)
> 
> 
> list("a", "b", "c") %>%
+   simplify() %>%
+   first()
[1] "a"

#8

In short: How to extract one element from a list

@jmichaelrosenberg, for your purpose, you could use purrr package from the tidyverse and its new function pluck to get one element for a list. Here is some reprex:

library(purrr)
# Get the first élement as in your example
list("a", "b", "c") %>%
  pluck(1)
#> [1] "a"

A deeper element extraction : get the second element in the list that is the second element of the main list.

# With index
list(a = "a", b = list(b1 = "b1", b2 = "b2")) %>%
  pluck(2, 2)
#> [1] "b2"
# or with name
list(a = "a", b = list(b1 = "b1", b2 = "b2")) %>%
  pluck("b", "b2")
#> [1] "b2"

it is equivalent to list(a = "a", b = list(b1 = "b1", b2 = "b2"))[[2]][[2]]

Some explanation about the initial issue with your example code: about map function

In your code there is a problem of syntax. map applies a function to a each element of a list. The function could be define as an anonymous function (function(x) x[[1]]) or with a special formula syntax (~ .x[[1]]) - in your example your mixed both. As you use the second syntax, map understood that it has to apply this function :

function() {
    function(x) x[[1]]
}

It is why you have a result with a list of function. Correct use would be one of those:

library(purrr)
# syntax 1 : anonymous functions
list(A = list("a1","a2"), B = list("b1", "b2")) %>% 
  map(function(x) x[[1]])
#> $A
#> [1] "a1"
#> 
#> $B
#> [1] "b1"
# syntax 2 : formula anonymous function
list(A = list("a1","a2"), B = list("b1", "b2")) %>% 
  map(~ .x[[1]])
#> $A
#> [1] "a1"
#> 
#> $B
#> [1] "b1"
# syntax 3 : apply a function with argument
list(A = list("a1","a2"), B = list("b1", "b2")) %>% 
  map(`[[`, 1)
#> $A
#> [1] "a1"
#> 
#> $B
#> [1] "b1"
# syntax 4 : purrr's map extraction feature
list(A = list("a1","a2"), B = list("b1", "b2")) %>% 
  map(1)
#> $A
#> [1] "a1"
#> 
#> $B
#> [1] "b1"

As you see, each form extract the first element of each element of the input list for map. As bonus, how to get a vector from this extraction : use map_chr

library(purrr)
list(A = list("a1","a2"), B = list("b1", "b2")) %>% 
  map_chr(1)
#>    A    B 
#> "a1" "b1"

#9

Hi @Frank and @jmichaelrosenberg, re the library warnings - I tend to use suppressPackageStartupMessages(library(tidyverse)). It’s more typing, but to me this feels “safer” than using suppressWarnings.


#10

I like relying on magrittr’s . syntax:

list("a", "b", "c") %>% .[[1]]
#> "a"

especially with named elements

list(A = "a", B = "b", C = "c") %>% .$A
#> "a"

It is the same syntax you would use to pipe the lhs into the second, third, etc. argument of a function. The dot acts as a placeholder that stands for the result of the lhs.

iris %>% lm(Sepal.Width ~ Sepal.Length, data = .)

#11

I use this approach a lot, but it gets frustrating when trying to access something deeply nested:

library(magrittr)

nested <- list(1:2, 3)

nested %>% str() 
#> List of 2
#>  $ : int [1:2] 1 2
#>  $ : num 3

You can’t just chain indices:

nested %>% .[[1]][[2]]
#> Error in .[[.[[1]], 2]]: incorrect number of subscripts

…but have to either pipe repeatedly or use braces:

nested %>% 
    .[[1]] %>% 
    .[[2]]
#> [1] 2

nested %>% 
    { .[[1]][[2]] }
#> [1] 2

purrr::map's indexing ability helps a lot, but you still have to use pluck or .[[...]] to extract at the end, which leads to somewhat backwards indexing like

library(purrr)

list(1, list(list(list(2)))) %>% 
    map(c(1, 1, 1)) %>% 
    .[[2]]
#> [1] 2

It seems like a relatively common case when working with web APIs, which often return deeply nested JSON. It’s an odd case where the base syntax currently makes more sense, even if it requires a lot of brackets:

list(1, list(list(list(2))))[[2]][[1]][[1]][[1]]
#> [1] 2

#12

In those latter cases, just pluck by itself seems to have good readability while still working without extra syntax:

library(purrr)

list(1, list(list(list(2)))) %>%
  pluck(2, 1, 1, 1)
#> [1] 2

#13

@alistaire and @nick you guys make very good points; I’ll have to take a closer look at pluck(). FWIW, indexing deeply nested lists makes me grimace. The code is very readable, but is it comprehensible?


#14

OMG amazing; I clearly didn’t read pluck's docs well enough.


#15

Agreed – it’s just one step away from using “magic numbers” in your script (possibly a half- or quarter-step). Ideally, if it really doesn’t make sense to parse/process the entire data structure or deal with named elements, you would at least define the locations of interest at the top of the script, as a function default, or something along those lines.


#16

I mean, I usually try to turn the whole thing into a data.frame, but sometimes you just want one part of a response. If indexing uses names (provided they exist) instead of integer indices, it seems reasonably intelligible? …or maybe I just don’t see an alternative.


#17

I’m not suggesting that there is an alternative, other than using names or a different data structure. The grimace was one of despair :wink:


#18

Wow, I wasn’t aware of pluck. Fantastic, I really hate nested lists, so common now with JSON.


#19

I know this is an old thead but doesn’t this plain ol’ selector syntax produce the same results:


library(magrittr)

list(1, list(list(list(2)))) %>%
    .[[c(2, 1, 1, 1)]]
#> [1] 2

without needed purrr?

this works too

library(magrittr)

list(a = 1, b = list( c= list(d = list(e = 2)))) %>%
    .[[c("b", "c", "d", "e")]]
#> [1] 2

And the selector syntax can be used on the lhs which I don’t think can be done with purrr::pluck

library(magrittr)

tree <- list(1, list(list(list(2))))
tree %>%
    .[[c(2, 1, 1, 1)]]
#> [1] 2


tree <- list(1, list(list(list(2))))


tree[[c(2, 1, 1, 1)]] <- 99

tree
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [[2]][[1]]
#> [[2]][[1]][[1]]
#> [[2]][[1]][[1]][[1]]
#> [1] 99

# this fails
pluck(tree, 2, 1, 1, 1) <- 99
#> Error in pluck(tree, 2, 1, 1, 1) <- 99: could not find function "pluck<-"


# and this doesn't work either
p <- pluck(tree, 2, 1, 1, 1)
#> Error in pluck(tree, 2, 1, 1, 1): could not find function "pluck"
p <- 1111
tree
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [[2]][[1]]
#> [[2]][[1]][[1]]
#> [[2]][[1]][[1]][[1]]
#> [1] 99

BTW I think pluck doesn’t works for lhs type of operations because it recursively “plucks” elements out of list. The selector syntax gives you, in effect, a reference to an element in the list… at least until you assign to a variable.

I’m in the process of trying to understand pluck… I’m sure pluck has some capabilities that plain ol’ selectors don’t but I don’t see what they are. Hints appreciated :slight_smile:


#20

Just want to chime in with something I still use from time to time that I didn’t see mentioned in this thread.

getElement() will extract a list element by name or position. It’s what I used before dplyr had pull() or purrr had pluck().

library(magrittr)

iris %>% 
  getElement("Species") %>% 
  head()
#> [1] setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica

iris[1:4] %>% 
  lapply(mean) %>% 
  getElement("Petal.Length")
#> [1] 3.758