Understanding [<-, [[<-, and vec_c for vctrs list_of

Hello everyone, I am trying to better understand the behavior of [<-, [[<-, vec_c, and c for list_of classes. I am defining a list_of subclass that essentially is a list of a character vectors that should be split on spaces. I want to use the assignment operators to adjust values within as necessary, but am struggling to fully understand the vctrs implementation. I have provided a reprex below that highlights what I'm trying to figure out.

Example class

library(vctrs)
library(stringr)

new_example <- function(x = list()) {
  if (vec_is(x, character())) {
    x <- str_split(x, " ")
  }
  new_list_of(x,
              ptype = character(),
              class = "example")
}

vec_ptype_full.example <- function(x, ...) {
  "example"
}

vec_ptype_abbr.example <- function(x, ...) {
  "xmpl"
}

Boiler plate for coercion and casting

vec_ptype2.example <- function(x, y, ...) UseMethod("vec_ptype2.example", y)

vec_ptype2.example.default <- function(x, y, ..., x_arg = "x", y_arg = "y") {
  vec_default_ptype2(x, y, x_arg = x_arg, y_arg = y_arg)
}

vec_cast.example <- function(x, to, ...) UseMethod("vec_cast.example")

vec_cast.example.default <- function(x, to, ...) vec_default_cast(x, to)

Coercion between example and character

vec_ptype2.example.character <- function(x, y, ...) x

vec_ptype2.character.example <- function(x, y, ...) y

Casting between example and character

vec_cast.character.example <- function(x, to, ...) map_chr(x, str_c, collapse = " ")

vec_cast.example.character <- function(x, to, ...) new_example(x)

vec_cast.example.example <- function(x, to, ...) new_example(x)

Testing

a <- new_example("a b")
a
#> <example[1]>
#> [[1]]
#> [1] "a" "b"

# Error
a[2] <- "c d"
#> Error: Can't cast `x` <list_of<character>> to `to` <example>.

# String not split
a[[2]] <- "e f"
a
#> <example[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "e f"

# String is split
a[[2]] <- c("e", "f")
a
#> <example[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "e" "f"

# String is split
vec_c(a, "g h")
#> <example[3]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "e" "f"
#> 
#> [[3]]
#> [1] "g" "h"
c(a, "g h")
#> <example[3]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "e" "f"
#> 
#> [[3]]
#> [1] "g" "h"

In order to assign values to existing rows, is there a way to use either [<- or [[<- to assign a full string such as "d e" and be split, or is the best option to use [[<- with a character vector that is already split such as c("d", "e")?

The crux of the question is, using vec_c and c, the new value seems to be passed through new_example, and thus the string is split. [[<- seems to avoid this, and [<- converts the string to a list_of class before sending to the constructor function, but I'm still hazy on how I could adjust these details to work in this use case (and just better understand how vctrs is supposed to work).

1 Like

Hi, and welcome!

Fabulous reprex! So, I'm clear on the issue, are you wanting to start with

a[[1]]
[1] "a" "b"

and get to

> a[[1]]
[1] "a" "b" "c" "d" ...
1 Like

Thanks!

Ah yes, see I wasn't so clear on the actual issue, sorry! A typical case is that I will have a vector, and I want to change the value at a specific index.

b <- new_example(c("a b", "c e"))
b
#> <example[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "c" "e"

If I wanted to change the values for the 2nd item, I would want to do something like below (which currently produces an error).

b[2] <- "c d"
b
<example[2]>
[[1]]
[1] "a" "b"

[[2]]
[1] "c" "d"

I can do something like this with just a base list (although of course without the string being split).

c <- list("a b", "c e")
c[2] <- "c d"
c
#> [[1]]
#> [1] "a b"
#> 
#> [[2]]
#> [1] "c d"

Where I really am struggling is conceptualizing what is happening with this assignment in terms of the new example subclass I've created, since this produces an error. I expected that assigning a new value of "c d" to the 2nd index would be cast to an object of the example subclass, and succeed. However, it seems that it's already been cast to a list_of<character> which then fails because I haven't specified a cast between list_of<character> and example.

So is the solution to getting a[2] <- "c d" to work to improve/change the assignment methods somehow, or is it that I should instead explicitly define casts between the subclass and list_of<character>?

Thanks for the offer of help and let me know if anything else is unclear.

library(vctrs)
library(stringr)

new_example <- function(x = list()) {
  if (vec_is(x, character())) {
    x <- str_split(x, " ")
  }
  new_list_of(x,
              ptype = character(),
              class = "example")
}

b <- new_example(c("a b", "c e"))
class(b)
#> [1] "example"       "vctrs_list_of" "vctrs_vctr"
b[2] <- new_example("c d")
b
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "c" "d"
class(b[2])
#> [1] "example"       "vctrs_list_of" "vctrs_vctr"
str(b[1])
#> list<chr> [1:1] 
#> $ : chr [1:2] "a" "b"
#> @ ptype: chr(0)
str(b[2])
#> list<chr> [1:1] 
#> $ : chr [1:2] "c" "d"
#> @ ptype: chr(0)
b[2] <- "c d"
b
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "c d"
str(b[2])
#> list<chr> [1:1] 
#> $ : chr "c d"
#> @ ptype: chr(0)

Created on 2019-12-18 by the reprex package (v0.3.0)

The difference is that plain "c d" is a plain character, not a list of characters (except atomic). It might be possible, I suppose, to write a function that will coerce "c d" if it is being assigned to a slot already occupied by an example class, but just off the top of my head I suspect that would involve S4 and the brain damage that goes with it.

IIUC, you would you like each character vector of the list-of to represent a single string? And it's theoretically a subclass of character even though it's implemented (out of necessity) as a list?

Hey Lionel,

You have understood perfectly well. I'm designing this subclass to represent multiple select questions from survey responses (hence in the example above the first element corresponds to a respondent selecting a and b, and the second selecting c and e, and the underlying list structure makes it very easy to work with in terms of dropping or adding selected options. There is a sister subclass not in the reprex that is essentially the same, but with the underlying structure being a pure character vector with unsplit strings, i.e. c("a b", "c e").

This example subclass essentially makes it much simpler and efficient to add or remove selected options from a specific index, validate that elements of a vector are all elements of a separate attribute set, and otherwise work with the underlying data. This would allow me to cast to an example, do any necessary work, then cast back to character and export as needed.

Thanks for any help you can provide, I've been looking into it using sloop::s3_dispatch, sloop::s3_get_method, and reading on the topic from Advanced R and other posts I can find, but unsure how I could solve this in a manner that would be considered responsible/tidy (let alone successful!).

I'm trying to understand why a tabular structure like a data frame, or matrix wouldn't be appropriate / convenient for you work.

Wouldn't a structure where rows are survey takers and columns are responses be the tidy-est? (The columns names could be 'a', 'b', 'c' etc with logical values TRUE or FALSE.)

1 Like

I agree with @nirgrahamuk about the advantages of considering transforming the data. If, for example, column 1 is id and column 2 is response, stringr::str_split() would allow easy mutation of columns. If you represented the responses as integers, there's the added benefit of identifying the 24 unique permutations.

library(permutations)
S4 <- allperms(4)
S4
     {1} {2} {3} {4}
[1]  .   .   .   .  
[2]  .   .   4   3  
[3]  .   3   2   .  
[4]  .   3   4   2  
[5]  .   4   2   3  
[6]  .   4   .   2  
[7]  2   1   .   .  
[8]  2   1   4   3  
[9]  2   3   1   .  
[10] 2   3   4   1  
[11] 2   4   1   3  
[12] 2   4   .   1  
[13] 3   1   2   .  
[14] 3   1   4   2  
[15] 3   .   1   .  
[16] 3   .   4   1  
[17] 3   4   1   2  
[18] 3   4   2   1  
[19] 4   1   2   3  
[20] 4   1   .   2  
[21] 4   .   1   3  
[22] 4   .   .   1  
[23] 4   3   1   2  
[24] 4   3   2   1  

@technocrat @nirgrahamuk

Thanks both for the response. So a little more background, this project came about after creating a package where users can manipulate survey data using an Excel file written in the same "language" the survey was designed with. Essentially they write an Excel sheet that operates on surveys as they are exported (typically in Excel/CSV format) from the database, which for select multiple questions DOES include the binary columns for each choice option.

From this, I wanted to formally define classes to govern the work from the previous package, as there was a lot of code inefficiencies that could be improved through the use of proper attributes, and in particular for select multiple questions, an improved way to overwrite current values. Previously I had worked with the binary columns for this purpose, but it meant if we had 10 choice options, an operation replacing c("a b)" with c("g h") required setting all binary columns to F, checking all to ensure any outside of g and h were F, or using the character column to identify previously selected options and specifically setting those to F. An extension of the select multiple type question is also a rank question, where the columns for each possible choice are instead integers representing their relative ranking, and a re-ordering of the rankings is even more inefficient than for basic select multiple questions.

So when I started this new work to formalize a way to deal with these vectors, I essentially define 3 subclasses for select multiples (and a separate set of related subclasses for rankings):

  1. Character subclass built on top of a character vector
  2. The example from the original reprex, built on top of list_of
  3. Binary subclass built on top of a logical vector

My idea is that with a data.frame containing these vectors, I can easily define a set of functions that automatically cast these 3 subclasses between each other, with each state of the data.frame having a different purpose:

  1. Exporting data back to Excel or other format (data.frame would have both subclasses 1 and 3)
  2. Cleaning/manipulating the data (data.frame would only have subclass 2)
  3. Analysis (data.frame would only have subclass 3).

If you're at all interested, you can see what I have on my Github repo for the package, although it's quite early days (just noticed this issue as I was starting to work on a subclass for rank question binaries).

Long story short, I wanted this specific subclass to facilitate efficient manipulation of the data, particularly using pre-defined changes from an Excel file, and this was the best way I could come up with.

You may need a helper class derived from character and that disallows spaces. Then [[<-.vctrs_list_of will coerce inputs to that class, which will trigger type checking via your ptype2/cast methods.

2 Likes

Thank you Lionel. I'm commenting here to extend time before topic closes, will try and resolve using your solution and post here.