I've been refactoring some old non-tidyverse code and have been introducing list columns as they make it a lot easier to work with. Thanks to the posts on here for inspiring me and these slides for teaching me how to work with them!
One annoyance I have hit is that when I want to extract a single row out of a tibble, I have to unlist any list columns in order to fetch their value. This makes sense when I am using filter and there is the potential that I may get additional 0-many rows, so I was thinking that there is space for a single function that would give me a simple named list of the column items with any list columns de-listed.
Let's say I have a simple tibble, that's really like a hashmap or dictionary:
library(tidyverse)
listOfTibbles <- list(tibble(a = c(1,2,3), b = c(2, 3, 4)), tibble(), tibble())
tibbleWithListColumns <- tibble(key = c("a", "b", "c"), value = listOfTibbles)
tibbleWithListColumns
# A tibble: 3 x 2
key value
<chr> <list>
1 a <tibble [3 x 2]>
2 b <tibble [0 x 0]>
3 c <tibble [0 x 0]>
I'd want to be able to use 'single' rather than filter and have it return:
b <- tibbleWithListColumns %>% single(key == "a")
b
$key
[1] "a"
$value
# A tibble: 3 x 2
a b
<dbl> <dbl>
1 1 2
2 2 3
3 3 4
whereas filter understandably returns $value as a one element list
tibbleWithListColumns %>% filter(key == "a") %>% .$value
[[1]] # <------ A one element list
# A tibble: 3 x 2
a b
<dbl> <dbl>
1 1 2
2 2 3
3 3 4
I know a lot of the time this shouldn't be needed with the use of pmap to map things by row, but in my case it would have been useful and would have prevented [[1]] everywhere. The single function would throw an error if zero or > 1 items came back, much like Single in C#.
Unfortunately it would still need to be pull('value')[[1]] rather than just pull('value') to get an unlisted value so I would still have the annoying extra [[1]].
In the case I was using it there were 5-6 columns so $ was a bit less verbose than pull.
Yeah, I think we are crossing wires a bit. So I have a tibble with say 6 columns, some of which are list columns of some kind and one is a key column. I then want to pull out a single row and be able to refer to each column in it succinctly and clearly without having to have [[1]] all the time.
You have put me on to a decent solution though, using flatten:
listOfTibbles <- list(tibble(a = c(1,2,3), b = c(2, 3, 4)), tibble(), tibble())
tibbleWithListColumns <-
tibble(
key = c("a", "b", "c"),
value1 = listOfTibbles,
value2 = listOfTibbles,
value3 = listOfTibbles
)
result <- tibbleWithListColumns %>%
filter(key == "a") %>%
flatten
# $key
# [1] "a"
#
# $value1
# # A tibble: 3 x 2
# a b
# <dbl> <dbl>
# 1 1 2
# 2 2 3
# 3 3 4
#
# $value2
# # A tibble: 3 x 2
# a b
# <dbl> <dbl>
# 1 1 2
# 2 2 3
# 3 3 4
#
# $value3
# # A tibble: 3 x 2
# a b
# <dbl> <dbl>
# 1 1 2
# 2 2 3
# 3 3 4
I can then happily do result$value1 and get a dataframe or whatever object was in my list column without needing [[1]].
Thanks For me, I think there's a space for a method that does this in one, but two methods is fine. Maybe a hashmap type object would make more sense too, I imagine there are packages that provide them but they probably aren't tidyverse friendly.
The only additional feature that a single method would provide is that it would error if more than one row were returned whereas flatten returns a list with all the values mingled together.
Yep - flatten is a rather blunt instrument, so I'm not surprised it doesn't work perfectly in this use case.
Glad to hear my suggestion gets you closer to a workable solution. You might also experiment with the transpose %>% map pattern – I've found it to be useful when I want to access the rows of tibble. Good luck!
if you like. The only downside is that pluck requires quoting for variable names. I suppose a version of pull that accepts further indices or a version of pluck that accepts raw variable names could be useful, though the semantics may get confusing.
I've almost never used either in this idiom, though; I extract nested data frames with tidyr::unnest, subsetting before or after.
Thanks for the ideas, a clever use of pluck with which.
I've found pluck useful elsewhere, but in my case I want to be able to pass the whole row on to another function which can then extract whichever values it needs so having it in a regular list as flatten gives is better than having ways to pull out the individual cells.
I've found nest and unnest very useful too, and I know for my example it could be helpful as I used data frames, but in reality some of my list columns contained other s3 classes like a forecast model.
I should probably convert the whole thing to use pmap in the end.
Thanks. I guess transpose would only work if all the column data-types are the same?
Just discovered an irritating thing with flatten which is that it clears types like date and converts them to numeric. Maybe I just need to bite the bullet and create my single function