Finding Representative Groups of Data

Hi All,

I have large number of shops, let us say N, that sell different products. I have all the transactions for all the products sold across the N shops, so I can calculate which are the most popular products.

Since tracking all N shops is quite expensive, I would like to find out which subset of the N shops can give me the best estimate for the popularity of the products. The estimate should much as closely as possible the calculation made above across all shops. The confidence intervals need also to be calculated for the different subsets of the shops selected.

One additional requirement is that products that are selling the most need to have more influence in the selection process of the subset of the shops.

My data consist of: shopID, product

Thanks and regards.