I'm writing a function that will generate co-occurrence lists (trying to generalize an existing script to turn it into a package), and the beginning of the function requires me to select the column with comma-delimited IDs, and then use str_split() to separate them into a list of character vectors.
If I use the format df$variable to start, I get the expected output - a list of character vectors (list of 15, in this case), and clean strings.
If I use df %>% select(variable) %>% to start, I get a list of one. This is expected, I guess, but I'm not sure how to work around this. Also, it doesn't appear to be splitting the strings in the same way (e.g., "4023") - I've tried removing whitespace, etc.
I know in my naive way I think of these (df$variable, df %>% select(variable)) as accomplishing the same thing in other contexts, but I know this is also not the case. I'm not sure what select() is doing in the background or how to achieve the results I would like: being able to get the results of df$variable but generalizing to a function. If anyone has any insight, that would be marvelous. The output of this goes into a map(expand.grid()) function, if that is helpful.
Results from reprex():
Using select:
llibrary(tidyverse)
library(stringr)
library(nycflights13)
set.seed(12)
test <-
sample_n(flights, 1000)
flts <- aggregate(flight ~ carrier, paste, data = test, collapse = ",")
str(flts)
#> 'data.frame': 14 obs. of 2 variables:
#> $ carrier: chr "9E" "AA" "AS" "B6" ...
#> $ flight : chr "4023,2906,4135,3807,3459,3443,3525,4192,2934,3445,4120,3913,3367,4065,3992,3970,3400,3311,4220,3540,3367,4120,2"| __truncated__
flts %>%
select(flight) %>%
str_split(",")
#> [[1]]
#> [1] "c(\"4023" "2906" "4135" "3807" "3459"
#> [6] "3443" "3525" "4192" "2934" "3445"
#> [11] "4120" "3913" "3367" "4065" "3992"
#> [16] "3970" "3400" "3311" "4220" "3540"
#> [21] "3367" "4120" "2908" "3310" "3357"
#> [26] "4305" "4060" "3319" "3393" "4220"
#> [31] "2912" "3321" "3353" "4127" "4178"
#> [36] "3881" "4178" "3304" "3523" "3538"
#> [41] "4275" "3795" "3325" "3410" "3855"
#> [46] "3393" "4060" "3347" "2951" "3354"
#> [51] "3439" "3470" "3910" "3405" "3623"
#> [56] "3932" "4218\"" " \"1357" "2314" "1925"
#> [61] "325" "211" "1623" "321" "1103"
#> [66] "1769" "854" "655" "1850" "1073"
#> [71] "345" "1999" "565" "2019" "269"
#> [76] "33" "715" "145" "413" "117"
...(truncated)
#' Created on 2018-03-14 by the reprex package (v0.2.0).
And using df$variable:
library(tidyverse)
library(stringr)
library(nycflights13)
set.seed(12)
test <-
sample_n(flights, 1000)
flts <- aggregate(flight ~ carrier, paste, data = test, collapse = ",")
str(flts)
#> 'data.frame': 14 obs. of 2 variables:
#> $ carrier: chr "9E" "AA" "AS" "B6" ...
#> $ flight : chr "4023,2906,4135,3807,3459,3443,3525,4192,2934,3445,4120,3913,3367,4065,3992,3970,3400,3311,4220,3540,3367,4120,2"| __truncated__
flts$flight %>%
str_split(",")
#> [[1]]
#> [1] "4023" "2906" "4135" "3807" "3459" "3443" "3525" "4192" "2934" "3445"
#> [11] "4120" "3913" "3367" "4065" "3992" "3970" "3400" "3311" "4220" "3540"
#> [21] "3367" "4120" "2908" "3310" "3357" "4305" "4060" "3319" "3393" "4220"
#> [31] "2912" "3321" "3353" "4127" "4178" "3881" "4178" "3304" "3523" "3538"
#> [41] "4275" "3795" "3325" "3410" "3855" "3393" "4060" "3347" "2951" "3354"
#> [51] "3439" "3470" "3910" "3405" "3623" "3932" "4218"
#>
#> [[2]]
#> [1] "1357" "2314" "1925" "325" "211" "1623" "321" "1103" "1769" "854"
#> [11] "655" "1850" "1073" "345" "1999" "565" "2019" "269" "33" "715"
#> [21] "145" "413" "117" "1750" "1327" "1621" "301" "1769" "172" "717"
#> [31] "2314" "371" "269" "707" "1757" "313" "731" "341" "145" "1507"
#> [41] "3" "145" "84" "145" "739" "1762" "1410" "84" "19" "305"
#> [51] "119" "307" "1999" "1709" "359" "269" "543" "300" "19" "1837"
#> [61] "1073" "133" "745" "33" "269" "59" "1709" "1145" "1223" "1357"
#> [71] "753" "1" "2279" "85" "19" "2279" "84" "1611" "753" "19"
(...truncated)
#' Created on 2018-03-14 by the reprex package (v0.2.0).