Calculate average by group

shelly · December 11, 2021, 1:24am

set.seed(nchar(first_name)+nchar(last_name))
n.pop<-10000
subscribe<-sample(c(1,0),n.pop,replace=TRUE,prob=c(0.5,0.5))
pb1<-sample(65:75,1)/100
pb2<-0.5
ad.l1<-0.0
ad.l2<-sample(10:20,1)/100
ap1<-sample(65:75,1)/100
ap2<-0.5
set.seed(nchar(last_name))
see.ad.random<-runif(n.pop)
see.ad<-ifelse(subscribe,1*(see.ad.random<ap1),1*(see.ad.random<ap2))
buy.random<-runif(n.pop)
buy.thres1<-pb1+ad.l1see.ad
buy.thres2<-pb2+ad.l2see.ad
buy<-ifelse(subscribe,1*(buy.random<buy.thres1),1*(buy.random<buy.thres2))
data<-cbind.data.frame(subscribe,see.ad,buy)
rm(list = ls(pattern="[^data,first_name,last_name]"))

#the above is my data and I am trying to calculate the average rate for the group that sees the ad and for the ones who don't

I have the code below

df<-data.frame(see.ad=1, see.ad=0)
mean(df$see.ad)
sapply(df, mean)

but I am not sure if I am doing this correct, any comments?

technocrat · December 11, 2021, 5:53am

See the FAQ: How to do a minimal reproducible example reprex for beginners. Most of the pieces are here, but some glitches exist, such as

that requires reverse engineering to address the problems in the terms posed. A reprex has the advantage of running "as-is" on another's RStudio session.

Couple of pointers before getting to an example using simpler data

Use snake_case rather than dotted.separators as a matter of good style
don't name objects df, data, date or other words that are built-in functions or functions loaded by libraries; some operations give precedence to the function name
Anything in a Stats 101 textbook has a function already written. Instead of

use

subscribe <- rbinom(n=n_pop, size=1, prob=0.5)

Construct data frames directly

DF <- data.frame(subscribe = subscribe, see_ad = see_ad, buy = buy)

Here is fake data composed of binary outcomes illustrating contingency tables with count and with proportion results.

set.seed(42) 
N <- 100
exposed <- rbinom(n=N, size=1, prob=0.25)
set.seed(137)
purchased <- rbinom(n=N, size=1, prob=0.05)
DF <- data.frame(exposed = as.factor(exposed),purchased = as.factor(purchased))
table(DF)
#>        purchased
#> exposed  0  1
#>       0 72  2
#>       1 25  1
table(DF)/N
#>        purchased
#> exposed    0    1
#>       0 0.72 0.02
#>       1 0.25 0.01

shelly · December 12, 2021, 5:59pm

thanks for that this is what I came up with but again since I am new to r dont know if its correct

buy.subset = subset(data, buy ==1)
nrow(buy.subset) # 6478
nobuy.subset = subset(data, buy == 0)
nrow(nobuy.subset) # 3522
seeAd.subset = subset(data, see.ad == 1)
nrow(seeAd.subset) #5848
noSeeAd.subset = subset(data, see.ad == 0)
nrow(noSeeAd.subset) # 4152
###########################################
seeAdBuy.subset = subset(data, see.ad == 1 & buy == 1)
nrow(seeAdBuy.subset) #3993
seeAdNoBuy.subset = subset(data, see.ad == 1 & buy == 0)
nrow(seeAdNoBuy.subset) #1855

buyRateSeeAd = nrow(seeAdBuy.subset)/(nrow(seeAdBuy.subset)+nrow(seeAdNoBuy.subset)) #0.682797

shelly · December 12, 2021, 6:03pm

Also if I want to calculate the weights, what does that mean?

shelly · December 12, 2021, 6:16pm

data$result2 = c(1:10000)
if(data$see.ad == 1){data$result2 = 1/result1} else{data$result2 = 1/(1-result1)}

this is what I have as calculating the weights

technocrat · December 12, 2021, 8:15pm

This is still opaque—particularly without data. Questions that require reverse engineering the problem are far less likely to receive helpful answers than those with a cut-and-paste reprex described in the FAQ listed.

nviet · December 18, 2021, 5:29pm

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

system · January 8, 2022, 5:30pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.