Nested tibble of topic models in nested tibble

Hello dear friends and data genies,

I am having a little problem with some topic modelling in a nested tibble. I'll just give the steps I have done to create the nested tibble before my problem...
grouping id - col 1

  1. general wrangling, which subsets data by a grouping id - col 2
  2. data from each subset is turned into a document term matrix - col 3
  3. a dataframe of values for estimating the best number of topics for LDA, FindTopicsNumber {ldatuning}. - col 4
  4. a range of 7 integers for using the optimal value in col 3 as the base. - col5

This all nested in one tibble, lets call it tidy_item_nested like this ....

item_grup_id data               dtm             tunes         topic_base range_topics
          <dbl> <list>             <list>          <list>        <list>     <list>      
 1         1618 <tibble [302 x 4]> <dgCMatrx[,37]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 2         1804 <tibble [134 x 4]> <dgCMatrx[,13]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 3         3447 <tibble [759 x 4]> <dgCMatrx[,53]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 4         3268 <tibble [34 x 4]>  <dgCMatrx[,4]>  <df [29 x 4]> <int [1]>  <dbl [7]>   
 5         3135 <tibble [261 x 4]> <dgCMatrx[,24]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 6         4101 <tibble [201 x 4]> <dgCMatrx[,20]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 7         1802 <tibble [85 x 4]>  <dgCMatrx[,12]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 8         4169 <tibble [324 x 4]> <dgCMatrx[,26]> <df [29 x 4]> <int [1]>  <dbl [7]>   
 9         4711 <tibble [197 x 4]> <dgCMatrx[,15]> <df [29 x 4]> <int [1]>  <dbl [7]>   
10         3631 <tibble [300 x 4]> <dgCMatrx[,18]> <df [29 x 4]> <int [1]>  <dbl [7]>   
11           53 <tibble [329 x 4]> <dgCMatrx[,30]> <df [29 x 4]> <int [1]>  <dbl [7]>  

I now want to use this data to create my topic models for each row using the dtm in col 3 and the range of integers in column 6, and I know this code works on an individual dtm to create a nested tibble with a range of integers given for each row.

 topic_mods <- tibble(K = range_topics) %>%
  mutate(topic_model = future_map(K, ~stm(item_desc_dfm, K = .,
                                          verbose = FALSE,
                                          seed=TRUE)))

So I adapt this code to a map2 function like this...

plan(multisession, workers = 10)
tidy_item_nested <- tidy_item_nested %>%
  mutate(topic_mods = future_map2(range_topics, dfm, function(vec, dtm){
    tibble(K = vec) %>%
      mutate(topic_model = future_map(K, ~stm(dtm, K = .,
                                              verbose = FALSE,
                                              seed=TRUE)))
  }))

But this gives the following error...

Error: Problem with `mutate()` column `many_models`.
i `many_models = future_map2(...)`.
x Problem with `mutate()` column `topic_model`.
i `topic_model = future_map(K, ~stm(dtm, K = ., verbose = FALSE, seed = TRUE))`.
x Spectral initialization cannot be used for the overcomplete case (K greater than or equal to number of words in vocab)

I'm not really sure where to go from here, I saw on github that it could be an issue with the dtm size, but these dtms are not overly large, which was the problem I saw.

I want to check if it is this nesting nested tibbles was a problem or anything to do with the topic models. Does anyone have any ideas?

Thanks for reading!

Hello.
Thanks for providing code , but you could take further steps to make it more convenient for other forum users to help you.

Share some representative data that will enable your code to run and show the problematic behaviour.

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

Also, its useful to explicitly list the libraries that the code of your example relies upon.

It is not a small amount of data, these are just the first two rows...

tidy_item_nested <- structure(list(item_grup_id = c(1618, 1804), data = list(structure(list(
    item_id = c(1578, 8288, 11793, 25914, 26990, 29031, 40159, 
    44277, 50012, 52242, 52590, 52627, 56346, 56977, 59277, 61716, 
    64622, 71701, 71701, 72053, 72053, 72053, 77714, 78609, 81249, 
    85800, 89058, 98095, 98095, 102241, 103498, 103536, 103541, 
    105741, 105741, 109496, 265, 11836, 12908, 13831, 25573, 
    28342, 28342, 42690, 43192, 44393, 49396, 52608, 57657, 64520, 
    68186, 69208, 75675, 88174, 88194, 97081, 109498, 109499, 
    115154, 115365, 115426, 11807, 14474, 16137, 25674, 25944, 
    26990, 32972, 38323, 44373, 44373, 44413, 50504, 56346, 56977, 
    61716, 68186, 69208, 69285, 69296, 69400, 78609, 88194, 88194, 
    92951, 92951, 93497, 98374, 98374, 103498, 114949, 1578, 
    11793, 11807, 11828, 11836, 11921, 24537, 29031, 40159, 49396, 
    51332, 51378, 51991, 51991, 63181, 64622, 65069, 72294, 92951, 
    94701, 98374, 12908, 28342, 33368, 42690, 43192, 44161, 44277, 
    44373, 44393, 44413, 44432, 44445, 44466, 53310, 53354, 56346, 
    60671, 60905, 76222, 78454, 78609, 85800, 103541, 109499, 
    15564, 24537, 34138, 49767, 64520, 64622, 105741, 109496, 
    109498, 112005, 11836, 13831, 13831, 13884, 13884, 15556, 
    15564, 21731, 24827, 24827, 25512, 40159, 40159, 44393, 49767, 
    52599, 52599, 52608, 52608, 52627, 52627, 56977, 61716, 61716, 
    61844, 61844, 63358, 63358, 68186, 69400, 72294, 75798, 81249, 
    89058, 89058, 93239, 93497, 96526, 99326, 99341, 105461, 
    112005, 112017, 115154, 265, 11921, 13884, 15556, 25573, 
    28342, 44445, 49396, 61844, 69285, 115365, 115426, 26990, 
    33368, 33534, 50012, 56346, 56977, 59277, 60671, 60905, 78609, 
    94701, 98374, 103541, 15564, 15583, 21731, 25512, 25573, 
    25626, 25626, 25674, 25721, 25914, 25944, 26781, 26781, 33534, 
    44161, 51332, 52599, 63358, 64364, 69285, 69296, 69296, 69400, 
    75798, 89910, 94701, 96526, 101186, 112005, 114949, 8288, 
    15751, 31751, 32595, 34138, 53856, 63185, 77428, 78151, 79886, 
    80995, 89619, 89927, 93492, 103536, 112098, 8288, 15866, 
    16137, 31751, 32972, 38323, 53856, 64520, 64622, 78151, 79886, 
    80995, 89619, 89927, 93492, 105741, 112098, 8288, 15751, 
    24537, 31751, 32595, 34138, 51378, 52242, 52590, 53856, 63185, 
    76222, 77428, 77714, 78151, 78454, 79886, 80995, 89619, 89927, 
    93492, 102241, 103536, 112098), prfm_name = c("SCHW SFT DRK 2LT VAR", 
    "COCA COLA VAR PET      1L", "thisisannavalue", "KIRKS SOFT DRINK 1.25L", 
    "DIET RITE SOFT DRNK 1.25L", "SCHW SFT DRK   1.25L", "TRU BLU SOFT DRINK 1.25L", 
    "RIVERPORT DRNKS     1.25L", "DIET RITE SOFT DRNK 1.25L", 
    "LA ICE COLA            2L", "LA ICE COLA            2L", 
    "TRU BLU VAR            2L", "thisisannavalue", "thisisannavalue", 
    "DIET RITE SOFT DRNK 1.25L", "TRU BLU VAR            2L", 
    "SCHW SFT DRK   1.25L", "PEPSI  1.25LT  VAR", "PEPSI  1.25LT  VAR", 
    "PEPSI 2L VAR", "PEPSI 2L VAR", "PEPSI 2L VAR", "TRU BLU SOFT DRINK 1.25L", 
    "thisisannavalue", "SCHW SOFT DRINK      1.1L", "PEPSI  1.25LT  VAR", 
    "thisisannavalue", "thisisannavalue", "thisisannavalue", 
    "TRU BLU SOFT DRINK 1.25L", "COCA COLA VAR PET      1L", 
    "COCA COLA VAR PET      1L", "thisisannavalue", "PEPSI  1.25LT  VAR", 
    "PEPSI  1.25LT  VAR", "SCHW SFT DRK 2LT VAR", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "TRU BLU SOFT DRINK 1.25L", 
    "KIRKS SOFT DRINK 1.25L", "PEPSI  1.25LT  VAR", "PEPSI  1.25LT  VAR", 
    "PEPSI  1.25LT  VAR", "PEPSI 2L VAR", "RIVERPORT DRNKS     1.25L", 
    "B/GOLD SFT DRK  1.25L VAR", "TRU BLU VAR            2L", 
    "thisisannavalue", "SCHW SFT DRK   1.25L", "SCHW SFT DRK   1.25L", 
    "CAPI N/ALC MIXERS 750ML", "SCHW SFT DRK   1.25L", "CAPI N/ALC MIXERS 750ML", 
    "CAPI N/ALC MIXERS 750ML", "SCHW SFT DRK 2LT VAR", "SCHW SFT DRK 2LT VAR", 
    "PEPSI  1.25LT  VAR", "NEXBA NAT S/F         1LT", "NEXBA NAT S/F         1LT", 
    "CAPI N/ALC MIXERS 750ML", "thisisannavalue", "FANTA               1.25L", 
    "FANTA               1.25L", "KIRKS SOFT DRINK 1.25L", "thisisannavalue", 
    "DIET RITE SOFT DRNK 1.25L", "FANTA/SPRITE           2L", 
    "thisisannavalue", "RIVERPORT DRNKS     1.25L", "RIVERPORT DRNKS     1.25L", 
    "thisisannavalue", "FANTA/SPRITE           2L", "thisisannavalue", 
    "thisisannavalue", "TRU BLU VAR            2L", "SCHW SFT DRK   1.25L", 
    "CAPI N/ALC MIXERS 750ML", "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "KIRKS SOFT DRINK 1.25L", "thisisannavalue", "CAPI N/ALC MIXERS 750ML", 
    "CAPI N/ALC MIXERS 750ML", "thisisannavalue", "thisisannavalue", 
    "SCHW SOFT DRINK      1.1L", "thisisannavalue", "thisisannavalue", 
    "COCA COLA VAR PET      1L", "KIRKS SOFT DRINK 1.25L", "SCHW SFT DRK 2LT VAR", 
    "thisisannavalue", "thisisannavalue", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", 
    "SCHW SFT DRK   1.25L", "TRU BLU SOFT DRINK 1.25L", "B/GOLD SFT DRK  1.25L VAR", 
    "B/GOLD SFT DRK  1.25L VAR", "B/GOLD SFT DRK  1.25L VAR", 
    "B/GOLD SFT DRK  1.25L VAR", "B/GOLD SFT DRK  1.25L VAR", 
    "FANTA               1.25L", "SCHW SFT DRK   1.25L", "FANTA/SPRITE           2L", 
    "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", 
    "thisisannavalue", "thisisannavalue", "PEPSI  1.25LT  VAR", 
    "thisisannavalue", "PEPSI  1.25LT  VAR", "PEPSI 2L VAR", 
    "thisisannavalue", "RIVERPORT DRNKS     1.25L", "RIVERPORT DRNKS     1.25L", 
    "RIVERPORT DRNKS     1.25L", "thisisannavalue", "RIVERPORT DRNKS     1.25L", 
    "RIVERPORT DRNKS     1.25L", "RIVERPORT DRNKS     1.25L", 
    "COCA COLA VAR 1.25L", "COCA COLA VAR          2L", "thisisannavalue", 
    "COCA COLA VAR 1.25L", "COCA COLA VAR          2L", "PEPSI  1.25LT  VAR", 
    "PEPSI 2L VAR", "thisisannavalue", "PEPSI  1.25LT  VAR", 
    "thisisannavalue", "PEPSI  1.25LT  VAR", "SCHW SOFT DRINK      1.1L", 
    "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", "SCHW SOFT DRINK      1.1L", 
    "SCHW SFT DRK   1.25L", "SCHW SFT DRK   1.25L", "PEPSI  1.25LT  VAR", 
    "SCHW SFT DRK 2LT VAR", "SCHW SFT DRK 2LT VAR", "SCHW SOFT DRINK      1.1L", 
    "thisisannavalue", "TRU BLU SOFT DRINK 1.25L", "TRU BLU SOFT DRINK 1.25L", 
    "TRU BLU SOFT DRINK 1.25L", "TRU BLU SOFT DRINK 1.25L", "SCHW SOFT DRINK      1.1L", 
    "SCHW SOFT DRINK      1.1L", "SCHW SOFT DRINK      1.1L", 
    "TRU BLU SOFT DRINK 1.25L", "TRU BLU SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "TRU BLU SOFT DRINK 1.25L", "TRU BLU SOFT DRINK 1.25L", "RIVERPORT DRNKS     1.25L", 
    "SCHW SOFT DRINK      1.1L", "TRU BLU VAR            2L", 
    "TRU BLU VAR            2L", "TRU BLU VAR            2L", 
    "TRU BLU VAR            2L", "TRU BLU VAR            2L", 
    "TRU BLU VAR            2L", "thisisannavalue", "TRU BLU VAR            2L", 
    "TRU BLU VAR            2L", "TRU BLU VAR            2L", 
    "TRU BLU VAR            2L", "TRU BLU SOFT DRINK 1.25L", 
    "TRU BLU SOFT DRINK 1.25L", "SCHW SFT DRK   1.25L", "KIRKS SOFT DRINK 1.25L", 
    "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", "SCHW SOFT DRINK      1.1L", 
    "thisisannavalue", "thisisannavalue", "SCHW SOFT DRINK      1.1L", 
    "SCHW SOFT DRINK      1.1L", "SCHW SFT DRK 2LT VAR", "thisisannavalue", 
    "FANTA               1.25L", "COCA COLA VARIETIES 250ML", 
    "SCHW SOFT DRINK      1.1L", "SCHW SOFT DRINK      1.1L", 
    "NEXBA NAT S/F         1LT", "thisisannavalue", "thisisannavalue", 
    "TRU BLU SOFT DRINK 1.25L", "SCHW SOFT DRINK      1.1L", 
    "KIRKS SOFT DRINK 1.25L", "PEPSI  1.25LT  VAR", "RIVERPORT DRNKS     1.25L", 
    "B/GOLD SFT DRK  1.25L VAR", "TRU BLU VAR            2L", 
    "KIRKS SOFT DRINK 1.25L", "NEXBA NAT S/F         1LT", "CAPI N/ALC MIXERS 750ML", 
    "DIET RITE SOFT DRNK 1.25L", "thisisannavalue", "WDRF SFT DRNK 2L", 
    "DIET RITE SOFT DRNK 1.25L", "thisisannavalue", "thisisannavalue", 
    "DIET RITE SOFT DRNK 1.25L", "COCA COLA VAR 1.25L", "COCA COLA VAR          2L", 
    "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", 
    "thisisannavalue", "SCHW SOFT DRINK      1.1L", "KIRKS SOFT DRINK 1.25L", 
    "SCHW SOFT DRINK      1.1L", "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "WDRF SFT DRNK 2L", 
    "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", "TRU BLU VAR            2L", 
    "TRU BLU SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", "KIRKS SOFT DRINK 1.25L", 
    "thisisannavalue", "KIRKS SOFT DRINK 1.25L", "B/GOLD SFT DRK  1.25L VAR", 
    "SCHW SFT DRK 2LT VAR", "WDRF SFT DRNK 2L", "SCHW SOFT DRINK      1.1L", 
    "KIRKS SOFT DRINK 1.25L", "COCA COLA VAR PET      1L", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "thisisannavalue", 
    "COCA COLA VAR 1.25L", "COCA COLA VAR 1.25L", "COCA COLA VAR          2L", 
    "thisisannavalue", "COCA COLA VAR       600ML", "COCA COLA VAR 1.25L", 
    "thisisannavalue", "thisisannavalue", "COCA COLA VAR          2L", 
    "COCA COLA VAR PET      1L", "COCA COLA VAR 1.25L", "COCA COLA VAR PET      1L", 
    "FANTA               1.25L", "FANTA               1.25L", 
    "thisisannavalue", "FANTA/SPRITE           2L", "thisisannavalue", 
    "COCA COLA VAR 1.25L", "SCHW SFT DRK   1.25L", "SCHW SFT DRK   1.25L", 
    "thisisannavalue", "COCA COLA VAR       600ML", "COCA COLA VAR 1.25L", 
    "thisisannavalue", "thisisannavalue", "COCA COLA VAR          2L", 
    "PEPSI  1.25LT  VAR", "COCA COLA VAR 1.25L", "COCA COLA VAR PET      1L", 
    "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "B/GOLD SFT DRK  1.25L VAR", 
    "LA ICE COLA            2L", "LA ICE COLA            2L", 
    "COCA COLA VAR 1.25L", "COCA COLA VAR 1.25L", "PEPSI  1.25LT  VAR", 
    "COCA COLA VAR          2L", "TRU BLU SOFT DRINK 1.25L", 
    "thisisannavalue", "PEPSI 2L VAR", "COCA COLA VAR       600ML", 
    "COCA COLA VAR 1.25L", "thisisannavalue", "thisisannavalue", 
    "COCA COLA VAR          2L", "TRU BLU SOFT DRINK 1.25L", 
    "COCA COLA VAR PET      1L", "COCA COLA VAR 1.25L"), line = c(2L, 
    4L, 5L, 28L, 31L, 33L, 41L, 45L, 54L, 60L, 61L, 64L, 68L, 
    69L, 71L, 76L, 83L, 91L, 91L, 92L, 92L, 92L, 98L, 101L, 106L, 
    107L, 110L, 122L, 122L, 127L, 128L, 129L, 130L, 132L, 132L, 
    133L, 1L, 8L, 10L, 11L, 24L, 32L, 32L, 42L, 43L, 47L, 52L, 
    63L, 70L, 82L, 86L, 87L, 94L, 108L, 109L, 121L, 134L, 135L, 
    140L, 141L, 142L, 6L, 13L, 19L, 26L, 29L, 31L, 36L, 40L, 
    46L, 46L, 48L, 55L, 68L, 69L, 76L, 86L, 87L, 88L, 89L, 90L, 
    101L, 109L, 109L, 114L, 114L, 117L, 123L, 123L, 128L, 139L, 
    2L, 5L, 6L, 7L, 8L, 9L, 21L, 33L, 41L, 52L, 57L, 58L, 59L, 
    59L, 78L, 83L, 84L, 93L, 114L, 118L, 123L, 10L, 32L, 37L, 
    42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 65L, 66L, 
    68L, 73L, 74L, 96L, 100L, 101L, 107L, 130L, 135L, 15L, 21L, 
    39L, 53L, 82L, 83L, 132L, 133L, 134L, 136L, 8L, 11L, 11L, 
    12L, 12L, 14L, 15L, 20L, 22L, 22L, 23L, 41L, 41L, 47L, 53L, 
    62L, 62L, 63L, 63L, 64L, 64L, 69L, 76L, 76L, 77L, 77L, 80L, 
    80L, 86L, 90L, 93L, 95L, 106L, 110L, 110L, 115L, 117L, 119L, 
    124L, 125L, 131L, 136L, 137L, 140L, 1L, 9L, 12L, 14L, 24L, 
    32L, 50L, 52L, 77L, 88L, 141L, 142L, 31L, 37L, 38L, 54L, 
    68L, 69L, 71L, 73L, 74L, 101L, 118L, 123L, 130L, 15L, 16L, 
    20L, 23L, 24L, 25L, 25L, 26L, 27L, 28L, 29L, 30L, 30L, 38L, 
    44L, 57L, 62L, 80L, 81L, 88L, 89L, 89L, 90L, 95L, 112L, 118L, 
    119L, 126L, 136L, 139L, 4L, 17L, 34L, 35L, 39L, 67L, 79L, 
    97L, 99L, 104L, 105L, 111L, 113L, 116L, 129L, 138L, 4L, 18L, 
    19L, 34L, 36L, 40L, 67L, 82L, 83L, 99L, 104L, 105L, 111L, 
    113L, 116L, 132L, 138L, 4L, 17L, 21L, 34L, 35L, 39L, 58L, 
    60L, 61L, 67L, 79L, 96L, 97L, 98L, 99L, 100L, 104L, 105L, 
    111L, 113L, 116L, 127L, 129L, 138L), word = c("sunkist", 
    "pet", "sarsaparilla", "sarsaparilla", "rite", "sunkist", 
    "crush", "sarsaparilla", "rite", "ice", "ice", "crush", "caffeine", 
    "rite", "rite", "crush", "sunkist", "mountain", "dew", "mountain", 
    "dew", "caffeine", "ice", "caffeine", "sarsaparilla", "caffeine", 
    "crush", "mountain", "dew", "ice", "pet", "pet", "pet", "mountain", 
    "dew", "sunkist", "creaming", "squash", "max", "squash", 
    "creaming", "max", "creaming", "max", "max", "squash", "creaming", 
    "squash", "capi", "solo", "solo", "capi", "solo", "capi", 
    "capi", "solo", "solo", "max", "squash", "creaming", "capi", 
    "lime", "sprite", "sprite", "beer", "ginger", "beer", "sprite", 
    "sprite", "ginger", "beer", "lime", "sprite", "free", "lime", 
    "lime", "lime", "ginger", "free", "free", "free", "free", 
    "ginger", "beer", "ginger", "beer", "lime", "ginger", "beer", 
    "sprite", "free", "orange", "saxbys", "saxbys", "saxbys", 
    "saxbys", "saxbys", "gold", "orange", "orange", "gold", "gold", 
    "gold", "gold", "orange", "orange", "orange", "orange", "gold", 
    "saxbys", "gold", "saxbys", "pepsi", "pepsi", "coke", "pepsi", 
    "pepsi", "riverport", "riverport", "riverport", "riverport", 
    "riverport", "riverport", "riverport", "riverport", "coke", 
    "coke", "coke", "coke", "coke", "pepsi", "pepsi", "coke", 
    "pepsi", "coke", "pepsi", "zero", "zero", "zero", "zero", 
    "zero", "zero", "zero", "zero", "zero", "zero", "lemon", 
    "tru", "blu", "tru", "blu", "schw", "schw", "schw", "tru", 
    "blu", "lemon", "tru", "blu", "lemon", "schw", "tru", "blu", 
    "tru", "blu", "tru", "blu", "lemon", "tru", "blu", "tru", 
    "blu", "tru", "blu", "lemon", "lemon", "lemon", "schw", "schw", 
    "tru", "blu", "schw", "schw", "schw", "lemon", "lemon", "lemon", 
    "schw", "schw", "lemon", "soda", "soda", "soda", "soda", 
    "soda", "soda", "soda", "soda", "soda", "soda", "soda", "soda", 
    "diet", "diet", "diet", "diet", "diet", "diet", "diet", "diet", 
    "diet", "diet", "diet", "diet", "diet", "lemonade", "kirks", 
    "lemonade", "kirks", "kirks", "kirks", "lemonade", "kirks", 
    "kirks", "kirks", "kirks", "kirks", "lemonade", "lemonade", 
    "lemonade", "lemonade", "lemonade", "lemonade", "kirks", 
    "kirks", "kirks", "lemonade", "kirks", "lemonade", "kirks", 
    "lemonade", "lemonade", "lemonade", "lemonade", "kirks", 
    "coca", "coca", "coca", "coca", "coca", "coca", "coca", "coca", 
    "coca", "coca", "coca", "coca", "coca", "coca", "coca", "coca", 
    "sug", "sug", "sug", "sug", "sug", "sug", "sug", "sug", "sug", 
    "sug", "sug", "sug", "sug", "sug", "sug", "sug", "sug", "cola", 
    "cola", "cola", "cola", "cola", "cola", "cola", "cola", "cola", 
    "cola", "cola", "cola", "cola", "cola", "cola", "cola", "cola", 
    "cola", "cola", "cola", "cola", "cola", "cola", "cola")), row.names = c(NA, 
-302L), class = c("tbl_df", "tbl", "data.frame")), structure(list(
    item_id = c(277, 8783, 11552, 11552, 11882, 11914, 23861, 
    24361, 34292, 47459, 53417, 65297, 65297, 65320, 67197, 80459, 
    88917, 110513, 114744, 114746, 12588, 14366, 65320, 92932, 
    107182, 266, 277, 12924, 32084, 51728, 75961, 75961, 76098, 
    79399, 81346, 88917, 105506, 107184, 108416, 110513, 117353, 
    863, 19847, 32084, 40053, 58832, 66572, 67339, 76098, 115729, 
    863, 12588, 15040, 36499, 51324, 107182, 107184, 114171, 
    114217, 114218, 8783, 11552, 11882, 11914, 24361, 58832, 
    61372, 92932, 108404, 108415, 108416, 110513, 117353, 12924, 
    12965, 23861, 32084, 49856, 66572, 68845, 74643, 74648, 75961, 
    76098, 79399, 81346, 88917, 105506, 8783, 11552, 11882, 11914, 
    12588, 12924, 12965, 14366, 15040, 19847, 23861, 24361, 32084, 
    34292, 49856, 51324, 51728, 52117, 58832, 59733, 61372, 65297, 
    65320, 66572, 68845, 75961, 76098, 79399, 80459, 81346, 88917, 
    92932, 105506, 107182, 107184, 108404, 108415, 108416, 114171, 
    114217, 114218, 114744, 114746, 114979, 115729, 117353), 
    prfm_name = c("MOLNBRG BRD   700GM", "thisisannavalue", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "thisisannavalue", 
    "T/TOP BRD THE ONE   700GM", "HELGA BRD  680/850GM", "thisisannavalue", 
    "MIAS BREAD 650G", "MIAS WHEATBELT BREAD 750G", "BURGEN BRD  700GM", 
    "BURGEN BRD  700GM", "BURGEN BRD  700GM", "MIGHTY SOFT 700GM", 
    "GOLD MAX BRD   650GM", "T/TOP BREAD S/BLST  650GM", "HELGA WRAP 8PK      560GM", 
    "thisisannavalue", "BURGEN BREAD PRBTC 700GM", "ABBOTS VILL BRD 750/850GM", 
    "ATLANTIC BRD SL 900GM", "BURGEN BRD  700GM", "HELGA BRD  680/850GM", 
    "ABBOTTS BREAD   750/760GM", "MOLNBRG BRD   700GM", "MOLNBRG BRD   700GM", 
    "TIP TOP SUNBLEST 400GM", "T/TOP BREAD S/BLST  650GM", "thisisannavalue", 
    "T/TOP BREAD S/BLST  650GM", "T/TOP BREAD S/BLST  650GM", 
    "thisisannavalue", "T/TOP BREAD S/BLST  650GM", "T/TOP BREAD S/BLST  650GM", 
    "T/TOP BREAD S/BLST  650GM", "T/TOP BREAD S/BLST  650GM", 
    "ABBOTTS BREAD   750/760GM", "HELGA BRD L/CRB      700G", 
    "HELGA WRAP 8PK      560GM", "HELGA SOURDOUGH 750GM", "ABBOTS VILL BRD 750/850GM", 
    "thisisannavalue", "T/TOP BREAD S/BLST  650GM", "thisisannavalue", 
    "HELGA BRD  680/850GM", "T/TOP BRD GOLD MAX 700GM", "MIGHTY SOFT   650GM", 
    "thisisannavalue", "thisisannavalue", "ABBOTS VILL BRD 750/850GM", 
    "ABBOTS VILL BRD 750/850GM", "ABBOTS VILL BRD 750/850GM", 
    "thisisannavalue", "ABBOTS VILL BRD 750/850GM", "ABBOTTS BREAD   750/760GM", 
    "ABBOTTS BREAD   750/760GM", "ABBOTTS S/DOUGH BREAD", "thisisannavalue", 
    "ABBOTTS S/DOUGH BREAD", "thisisannavalue", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "HELGA BRD  680/850GM", 
    "HELGA BRD  680/850GM", "HELGA BRD  680/850GM", "HELGA BRD  680/850GM", 
    "HELGA BRD L/CRB      700G", "HELGA BRD L/CRB      700G", 
    "HELGA BRD L/CRB      700G", "HELGA WRAP 8PK      560GM", 
    "HELGA SOURDOUGH 750GM", "TIP TOP SUNBLEST 400GM", "thisisannavalue", 
    "T/TOP BRD THE ONE   700GM", "T/TOP BREAD S/BLST  650GM", 
    "T/TOP BRD 9GRAIN 700/750G", "T/TOP BRD GOLD MAX 700GM", 
    "T/TOP BRD THE ONE   700GM", "thisisannavalue", "thisisannavalue", 
    "T/TOP BREAD S/BLST  650GM", "thisisannavalue", "T/TOP BREAD S/BLST  650GM", 
    "T/TOP BREAD S/BLST  650GM", "T/TOP BREAD S/BLST  650GM", 
    "T/TOP BREAD S/BLST  650GM", "thisisannavalue", "thisisannavalue", 
    "thisisannavalue", "thisisannavalue", "ABBOTS VILL BRD 750/850GM", 
    "TIP TOP SUNBLEST 400GM", "thisisannavalue", "ATLANTIC BRD SL 900GM", 
    "ABBOTS VILL BRD 750/850GM", "thisisannavalue", "T/TOP BRD THE ONE   700GM", 
    "HELGA BRD  680/850GM", "T/TOP BREAD S/BLST  650GM", "thisisannavalue", 
    "T/TOP BRD 9GRAIN 700/750G", "ABBOTS VILL BRD 750/850GM", 
    "thisisannavalue", "thisisannavalue", "HELGA BRD  680/850GM", 
    "thisisannavalue", "HELGA BRD  680/850GM", "BURGEN BRD  700GM", 
    "BURGEN BRD  700GM", "T/TOP BRD GOLD MAX 700GM", "T/TOP BRD THE ONE   700GM", 
    "T/TOP BREAD S/BLST  650GM", "thisisannavalue", "T/TOP BREAD S/BLST  650GM", 
    "GOLD MAX BRD   650GM", "T/TOP BREAD S/BLST  650GM", "T/TOP BREAD S/BLST  650GM", 
    "HELGA BRD  680/850GM", "T/TOP BREAD S/BLST  650GM", "ABBOTTS BREAD   750/760GM", 
    "ABBOTTS BREAD   750/760GM", "HELGA BRD L/CRB      700G", 
    "HELGA BRD L/CRB      700G", "HELGA BRD L/CRB      700G", 
    "ABBOTTS S/DOUGH BREAD", "thisisannavalue", "ABBOTTS S/DOUGH BREAD", 
    "thisisannavalue", "BURGEN BREAD PRBTC 700GM", "thisisannavalue", 
    "thisisannavalue", "HELGA SOURDOUGH 750GM"), line = c(2L, 
    4L, 5L, 5L, 6L, 7L, 14L, 15L, 17L, 20L, 26L, 30L, 30L, 31L, 
    33L, 41L, 43L, 51L, 55L, 56L, 8L, 11L, 31L, 44L, 46L, 1L, 
    2L, 9L, 16L, 23L, 38L, 38L, 39L, 40L, 42L, 43L, 45L, 47L, 
    50L, 51L, 59L, 3L, 13L, 16L, 19L, 27L, 32L, 34L, 39L, 58L, 
    3L, 8L, 12L, 18L, 22L, 46L, 47L, 52L, 53L, 54L, 4L, 5L, 6L, 
    7L, 15L, 27L, 29L, 44L, 48L, 49L, 50L, 51L, 59L, 9L, 10L, 
    14L, 16L, 21L, 32L, 35L, 36L, 37L, 38L, 39L, 40L, 42L, 43L, 
    45L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L, 21L, 22L, 23L, 24L, 27L, 28L, 29L, 30L, 31L, 32L, 
    35L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 
    49L, 50L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L), word = c("toast", 
    "crb", "crb", "seed", "crb", "crb", "toast", "seed", "toast", 
    "multigrain", "multigrain", "burgen", "seed", "burgen", "multigrain", 
    "multigrain", "toast", "seed", "burgen", "burgen", "rye", 
    "rye", "rye", "rye", "rye", "grn", "grn", "blst", "blst", 
    "grn", "blst", "grn", "blst", "blst", "blst", "blst", "blst", 
    "grn", "grn", "grn", "grn", "grain", "grain", "grain", "grain", 
    "grain", "grain", "grain", "grain", "grain", "abbotts", "abbotts", 
    "abbotts", "abbotts", "abbotts", "abbotts", "abbotts", "abbotts", 
    "abbotts", "abbotts", "helga", "helga", "helga", "helga", 
    "helga", "helga", "helga", "helga", "helga", "helga", "helga", 
    "helga", "helga", "top", "top", "top", "top", "top", "top", 
    "top", "top", "top", "top", "top", "top", "top", "top", "top", 
    "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", 
    "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", 
    "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", 
    "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", 
    "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", "brd", 
    "brd")), row.names = c(NA, -134L), class = c("tbl_df", "tbl", 
"data.frame"))), dfm = list(new("dgCMatrix", i = c(0L, 3L, 1L, 
3L, 18L, 2L, 3L, 9L, 10L, 13L, 20L, 2L, 3L, 5L, 6L, 9L, 10L, 
13L, 16L, 17L, 20L, 3L, 3L, 4L, 2L, 3L, 9L, 11L, 13L, 19L, 3L, 
5L, 17L, 3L, 5L, 9L, 10L, 12L, 13L, 14L, 15L, 20L, 3L, 6L, 7L, 
3L, 6L, 7L, 3L, 8L, 0L, 3L, 3L, 9L, 13L, 3L, 10L, 3L, 11L, 3L, 
5L, 17L, 12L, 18L, 0L, 1L, 2L, 3L, 6L, 7L, 18L, 19L, 1L, 2L, 
3L, 5L, 12L, 18L, 3L, 4L, 8L, 0L, 2L, 3L, 4L, 12L, 14L, 21L, 
22L, 1L, 3L, 7L, 12L, 3L, 10L, 14L, 15L, 0L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 22L, 6L, 16L, 3L, 5L, 17L, 3L, 5L, 17L, 2L, 6L, 
12L, 14L, 15L, 18L, 12L, 18L, 0L, 3L, 4L, 8L, 11L, 3L, 5L, 17L, 
3L, 6L, 7L, 0L, 2L, 3L, 5L, 22L, 0L, 1L, 3L, 4L, 3L, 4L, 6L, 
7L, 22L), p = c(0L, 2L, 5L, 6L, 11L, 21L, 22L, 24L, 30L, 33L, 
42L, 45L, 48L, 50L, 52L, 55L, 57L, 59L, 62L, 64L, 72L, 78L, 81L, 
89L, 93L, 97L, 107L, 109L, 112L, 115L, 121L, 123L, 128L, 131L, 
134L, 139L, 143L, 148L), Dim = c(23L, 37L), Dimnames = list(c("KIRKS SOFT DRINK 1.25L", 
"SCHW SOFT DRINK      1.1L", "B/GOLD SFT DRK  1.25L VAR", "thisisannavalue", 
"RIVERPORT DRNKS     1.25L", "PEPSI  1.25LT  VAR", "TRU BLU SOFT DRINK 1.25L", 
"TRU BLU VAR            2L", "CAPI N/ALC MIXERS 750ML", "COCA COLA VAR 1.25L", 
"COCA COLA VAR PET      1L", "DIET RITE SOFT DRNK 1.25L", "SCHW SFT DRK   1.25L", 
"COCA COLA VAR          2L", "FANTA               1.25L", "FANTA/SPRITE           2L", 
"LA ICE COLA            2L", "PEPSI 2L VAR", "SCHW SFT DRK 2LT VAR", 
"WDRF SFT DRNK 2L", "COCA COLA VAR       600ML", "COCA COLA VARIETIES 250ML", 
"NEXBA NAT S/F         1LT"), c("kirks", "schw", "gold", "coca", 
"cola", "saxbys", "riverport", "diet", "pepsi", "sug", "blu", 
"tru", "capi", "free", "coke", "pet", "rite", "max", "solo", 
"lemonade", "zero", "ginger", "lemon", "lime", "sprite", "soda", 
"ice", "dew", "mountain", "orange", "sunkist", "beer", "caffeine", 
"crush", "creaming", "sarsaparilla", "squash")), x = c(13, 2, 
9, 1, 1, 7, 7, 4, 2, 2, 1, 2, 7, 1, 2, 4, 2, 2, 2, 1, 1, 7, 2, 
6, 1, 6, 1, 3, 1, 1, 1, 5, 2, 5, 1, 3, 1, 2, 1, 2, 1, 1, 1, 5, 
5, 1, 5, 5, 1, 4, 4, 2, 4, 2, 2, 1, 3, 1, 3, 1, 3, 1, 3, 2, 2, 
3, 2, 3, 1, 1, 1, 2, 3, 1, 1, 1, 2, 2, 3, 1, 2, 2, 1, 3, 1, 1, 
1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 
2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 1, 
2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
    factors = list()), new("dgCMatrix", i = c(0L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 9L, 10L, 11L, 12L, 13L, 19L, 20L, 21L, 0L, 1L, 
21L, 0L, 1L, 9L, 19L, 20L, 21L, 0L, 2L, 5L, 6L, 0L, 3L, 4L, 13L, 
14L, 0L, 0L, 1L, 2L, 3L, 17L, 20L, 0L, 7L, 11L, 0L, 1L, 4L, 5L, 
8L, 13L, 14L, 2L, 3L, 5L, 7L, 10L, 0L, 3L, 7L, 14L, 12L, 15L, 
16L, 18L, 0L, 1L, 8L, 9L), p = c(0L, 16L, 19L, 25L, 29L, 34L, 
35L, 41L, 44L, 51L, 56L, 60L, 64L, 68L), Dim = c(22L, 13L), Dimnames = list(
    c("thisisannavalue", "T/TOP BREAD S/BLST  650GM", "ABBOTS VILL BRD 750/850GM", 
    "HELGA BRD  680/850GM", "HELGA BRD L/CRB      700G", "ABBOTTS BREAD   750/760GM", 
    "ABBOTTS S/DOUGH BREAD", "BURGEN BRD  700GM", "MOLNBRG BRD   700GM", 
    "T/TOP BRD THE ONE   700GM", "ATLANTIC BRD SL 900GM", "BURGEN BREAD PRBTC 700GM", 
    "GOLD MAX BRD   650GM", "HELGA SOURDOUGH 750GM", "HELGA WRAP 8PK      560GM", 
    "MIAS BREAD 650G", "MIAS WHEATBELT BREAD 750G", "MIGHTY SOFT   650GM", 
    "MIGHTY SOFT 700GM", "T/TOP BRD 9GRAIN 700/750G", "T/TOP BRD GOLD MAX 700GM", 
    "TIP TOP SUNBLEST 400GM"), c("brd", "blst", "top", "abbotts", 
    "helga", "crb", "grain", "burgen", "grn", "rye", "seed", 
    "multigrain", "toast")), x = c(15, 6, 3, 4, 3, 2, 2, 2, 2, 
1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 4, 6, 2, 1, 1, 1, 2, 4, 2, 2, 4, 
4, 3, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), factors = list())), 
    tunes = list(structure(list(topics = 30:2, Griffiths2004 = c(-854.551680360091, 
    -860.704295555564, -839.067760779945, -846.552947568362, 
    -837.908183832536, -855.464988279012, -862.213742806855, 
    -848.017423514668, -823.681113611704, -832.765064997717, 
    -827.400638306245, -819.00405944484, -824.356133477815, -830.553490025809, 
    -821.347833439129, -852.997419921903, -817.856460687809, 
    -828.922555810595, -805.329110099987, -803.833734007181, 
    -812.208172081688, -828.382385116717, -827.518651407674, 
    -828.983692494386, -845.703621941422, -871.962236399492, 
    -921.184517684608, -952.932320681289, -1021.58835978192), 
        CaoJuan2009 = c(0.145485431822419, 0.149666708406682, 
        0.131201059243777, 0.154052115501056, 0.151363559844428, 
        0.133113688758827, 0.132820158663369, 0.127445108378677, 
        0.114536581302837, 0.127444055306529, 0.121494987274631, 
        0.111482992516872, 0.124607826679032, 0.107784046477176, 
        0.121046725142412, 0.1186204362078, 0.0995276384664832, 
        0.130093079072772, 0.114053638929758, 0.0817775258001928, 
        0.0748587190331935, 0.0887831901926185, 0.0630497740589686, 
        0.0822665505858109, 0.0682313720123576, 0.0761569501573767, 
        0.058307426779435, 0.0389022173461228, 0.0546098129393944
        ), Deveaud2014 = c(1.09273750363231, 1.09500829072783, 
        1.16957481553461, 1.10433213798488, 1.12546069917499, 
        1.18289292913075, 1.20434506408961, 1.23130433717979, 
        1.29142838582361, 1.2841699756302, 1.30266544216608, 
        1.32779957691544, 1.30201168802316, 1.37354270680479, 
        1.34303383589333, 1.40152864512202, 1.50492356260397, 
        1.44737463151015, 1.49807991922298, 1.63169558465461, 
        1.64698019661021, 1.62206646766972, 1.75851317641994, 
        1.73389626533455, 1.78377598497756, 1.80992306720312, 
        1.89996371338076, 2.01361500388636, 1.99993390290997)), row.names = c(NA, 
    -29L), class = "data.frame"), structure(list(topics = 30:2, 
        Griffiths2004 = c(-268.942342586686, -250.302254183421, 
        -256.725309647274, -254.955711695224, -261.580495215534, 
        -259.51543113209, -262.160362736542, -242.649011539991, 
        -248.723186765444, -242.333099976894, -239.353026310531, 
        -238.064833161333, -264.118466703334, -245.322958866222, 
        -245.269420578441, -255.268042007245, -238.062713308704, 
        -240.610594015603, -241.649448904537, -250.466670793635, 
        -236.729622752588, -241.751840850082, -243.423413731783, 
        -243.105767856439, -244.001664508956, -250.269998609728, 
        -261.542210475253, -280.970680314536, -301.436038935331
        ), CaoJuan2009 = c(0.21559592974595, 0.244716536094182, 
        0.253764695703659, 0.304704828318268, 0.232883687499782, 
        0.281603787206519, 0.203569666130624, 0.228567504972373, 
        0.222082255219453, 0.241185063179526, 0.282835399394888, 
        0.211963756908297, 0.23176279328266, 0.223973859103252, 
        0.264970979405895, 0.272109056992703, 0.227592575534001, 
        0.180904445898263, 0.233709645822734, 0.250170364977629, 
        0.234066547251034, 0.194345551542507, 0.253087148168298, 
        0.188213804940797, 0.311657285801961, 0.134096458552257, 
        0.20190760318968, 0.0635056031083583, 0.0547350972746697
        ), Deveaud2014 = c(1.02366239290573, 0.956554210328646, 
        0.998191157805887, 0.910635495318842, 1.07197595179148, 
        1.00325229415139, 1.12065166222431, 1.13250703080738, 
        1.09838840281768, 1.10251250497597, 1.08269311491829, 
        1.22633719436064, 1.19831219425431, 1.24559072972122, 
        1.15438534831094, 1.17501978506844, 1.32294482450588, 
        1.36703810710028, 1.30742404368234, 1.29538365028867, 
        1.32997504886007, 1.4442862244431, 1.4228320742874, 1.67267453437184, 
        1.33507658985192, 1.82816744436782, 1.70430465091946, 
        2.07553206622025, 2.04131142936925)), row.names = c(NA, 
    -29L), class = "data.frame")), topic_base = list(11L, 10L), 
    range_topics = list(c(9, 10, 11, 12, 13, 14, 15), c(8, 9, 
    10, 11, 12, 13, 14))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

How did you come up with the integers ? it seems with stm you can't have K be larger( or equal to) vocab size and your second row dfm/dtm has dimension 13 for vocab size which is a problem for your tidy_item_nested$range_topic[[2]] because its last 2 K values are 13 and 14 (first equals second is greater than). if you force k=2 rathen than taking from range topics , for example the code would work without throwing your error.

1 Like

I took the suggested topics from FindTopicsNumber {ldatuning}, and then create a vector around the optimal number based on the output metrics. I had not considered the vocab size, which I had assumed to usually be over 30, but had not planned for too small a dtm :man_facepalming:. It is a case of making sure the range_topic are less than the NCOL of the dtm, or filtering the vector. Seems such a simple problem now.

Thank you. :slight_smile: :+1:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.