Calculate average by group

set.seed(nchar(first_name)+nchar(last_name))
n.pop<-10000
subscribe<-sample(c(1,0),n.pop,replace=TRUE,prob=c(0.5,0.5))
pb1<-sample(65:75,1)/100
pb2<-0.5
ad.l1<-0.0
ad.l2<-sample(10:20,1)/100
ap1<-sample(65:75,1)/100
ap2<-0.5
set.seed(nchar(last_name))
see.ad.random<-runif(n.pop)
see.ad<-ifelse(subscribe,1*(see.ad.random<ap1),1*(see.ad.random<ap2))
buy.random<-runif(n.pop)
buy.thres1<-pb1+ad.l1see.ad
buy.thres2<-pb2+ad.l2
see.ad
buy<-ifelse(subscribe,1*(buy.random<buy.thres1),1*(buy.random<buy.thres2))
data<-cbind.data.frame(subscribe,see.ad,buy)
rm(list = ls(pattern="[^data,first_name,last_name]"))

#the above is my data and I am trying to calculate the average rate for the group that sees the ad and for the ones who don't

I have the code below

df<-data.frame(see.ad=1, see.ad=0)
mean(df$see.ad)
sapply(df, mean)

but I am not sure if I am doing this correct, any comments?

See the FAQ: How to do a minimal reproducible example reprex for beginners. Most of the pieces are here, but some glitches exist, such as

that requires reverse engineering to address the problems in the terms posed. A reprex has the advantage of running "as-is" on another's RStudio session.

Couple of pointers before getting to an example using simpler data

  1. Use snake_case rather than dotted.separators as a matter of good style
  2. don't name objects df, data, date or other words that are built-in functions or functions loaded by libraries; some operations give precedence to the function name
  3. Anything in a Stats 101 textbook has a function already written. Instead of

use

subscribe <- rbinom(n=n_pop, size=1, prob=0.5)
  1. Construct data frames directly
DF <- data.frame(subscribe = subscribe, see_ad = see_ad, buy = buy)

Here is fake data composed of binary outcomes illustrating contingency tables with count and with proportion results.

set.seed(42) 
N <- 100
exposed <- rbinom(n=N, size=1, prob=0.25)
set.seed(137)
purchased <- rbinom(n=N, size=1, prob=0.05)
DF <- data.frame(exposed = as.factor(exposed),purchased = as.factor(purchased))
table(DF)
#>        purchased
#> exposed  0  1
#>       0 72  2
#>       1 25  1
table(DF)/N
#>        purchased
#> exposed    0    1
#>       0 0.72 0.02
#>       1 0.25 0.01
1 Like

thanks for that this is what I came up with but again since I am new to r dont know if its correct

buy.subset = subset(data, buy ==1)
nrow(buy.subset) # 6478
nobuy.subset = subset(data, buy == 0)
nrow(nobuy.subset) # 3522
seeAd.subset = subset(data, see.ad == 1)
nrow(seeAd.subset) #5848
noSeeAd.subset = subset(data, see.ad == 0)
nrow(noSeeAd.subset) # 4152
###########################################
seeAdBuy.subset = subset(data, see.ad == 1 & buy == 1)
nrow(seeAdBuy.subset) #3993
seeAdNoBuy.subset = subset(data, see.ad == 1 & buy == 0)
nrow(seeAdNoBuy.subset) #1855

buyRateSeeAd = nrow(seeAdBuy.subset)/(nrow(seeAdBuy.subset)+nrow(seeAdNoBuy.subset)) #0.682797

Also if I want to calculate the weights, what does that mean?

data$result2 = c(1:10000)
if(data$see.ad == 1){data$result2 = 1/result1} else{data$result2 = 1/(1-result1)}

this is what I have as calculating the weights

This is still opaque—particularly without data. Questions that require reverse engineering the problem are far less likely to receive helpful answers than those with a cut-and-paste reprex described in the FAQ listed.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.