Thanks for your input! Here is the code I have so far.. I'm using a package that was recommended to me for the application I want. HapEstXXR. The results it populates are actually what I want, with two exceptions.
1: It is limited to only run 15 "stores". I need a scale solution essentially in order to apply this to the rest of my data.
2: The results are output with multiple instances of the "store", but ideally I'd want it to be a unique list.
The only omitted aspects of the code I pasted are where I do a get/setwd() to load the sample data file.
Here are the smaller data set, and the associated occurrence matrix files for it.
sample_data = fread("sample_data_small.csv", header = T)
#store item combinations algorithm
# setting columns to objects? not sure if correct method???
#all_store_items_possible <- (store = sample_data[ , 1],
# item = sample_data[ , 2])
store = sample_data[ , 1]
item = sample_data[ , 2]
# occurrence matrix- grid of store items and if they take it or not..
occurrence_matrix <- dcast(sample_data, store ~ item,
fun.aggregate = length,
value.var = "item")
#occurrence_array <- reshape2::acast(sample_data, store ~ item ~ item),
# fun.aggregate = length,
# value.var = 'item')
# rowsums and colsums to see the how popular the most popular stores and items are freqncy
# popular items
#colSums(occurrence_matrix[ , !item])
##how popular is the most popular item?
#max(colSums(occurrence_matrix[ , !item]))
sample_data[ , uniqueN(store) , by = .(item)][ , max(V1)]
# popular stores
#rowSums(occurrence_matrix[ , !store])
# volume scale of top stores
#max(rowSums(occurrence_matrix[ , !store]))
# same view
sample_data[ , uniqueN(item) , by = .(store)][ , max(V1)]
#all possible sets of stores- idk if can handle
unique_stores <- sample_data[ , unique(store)]
all_store_combns <- HapEstXXR::powerset(unique_stores)
# might be too many results?
#how to trim down ahead of time?
all_store_combns <- all_store_combns[ sapply(all_store_combns, length) > 3 & # not interested in sets that that have "X" input or less stores in them
sapply(all_store_combns, length) <= 10 ]
# keep sets that have more than Y second input stores in them,
names(all_store_combns) <- sapply(all_store_combns, function(i) paste0(sort(i), collapse = ''))
result <- sapply(all_store_combns, USE.NAMES = TRUE, simplify = FALSE,
# subset down to relevant store subset
x <- sample_data[store %in% store_set]
# count how many stores each item is represented in
x[, cnt := uniqueN(store) , by = item]
# NB: IFF there are no duplicate rows, then the following line does the same thing more efficiently
x[, cnt := .N , by = item]
# remove items that aren't present in all stores
x1 <- x[ cnt == length(store_set) ]